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ABSTRACT 



One of a regular series on the statu- and progress of 
research into the nature of speech, instrumentation for its 
investigation, and practical applications, this report consists of 17 
papers dealing with the following topics: (1) vagueness and fictions 
as cornerstones of a theory of perceiving and acting— a commentary on 
D.O. Walter; (2) the informational support for upright stance; (3) 
determining the extent of coarticulation — effects of experimertal 
design; (4) the roles of phoneme frequency, similarity, and 
availability in the experimental elicitation of speech errors; (5) on 
learning to speak; (6) the motor theory of speech perception revised; 
(7) linguistic and acoustic correlates of the perceptual structure 
found in an individual differences scaling study of vowels; (8) 
perceptual coherence of speech — stability of silence-cued stop 
consonants; (9) development of the speech perceptumotor system; (10) 
dependence of reading on orthography — investigations in 
Serbo-Croatian; (11) the relationship between knowledge of 
derivational morphology and spelling ability in fourth, sixth, and 
eighth graders; (12) relations among regular and irregular, 
morphologically related words in the lexicon as revealed by 
repetition priming: (13) grammatical priming of inflected nouns by 
the gender of post ssive adjectives; (14) grammatical priming of 
inflected nouns b> inflected adjectives; (15) deaf signers and serial 
in the visual modality— memory for signs, f ingerspslling, and 
ERIC?i ? ( } 6) did. orthographies evolve? and (17) the development of 
ldren s sensitivity to factors influencing vowel reading. (HOD) 
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ON VAGUENESS AND FICTIONS AS CORNERSTONES OF A THEORY OF PERCEIVING AND ACT- 
ING: A COMMENT ON WALTER (1983)* 



Ciaudia Carellot and M. T. Turveytt 



"I don't want realism. I want magic!" 

Blanche DuBois, Scene 9, A Streetcar Named Desire 

Vagueness or unclarity of thought is considered by Walter (1983) as a 
worthy and necessary state of (human) mind for modeling. He appeals to quan- 
tum mechanics (and, in particular, non-pure states) as, perhaps, the only 
fruitful model by which to understand such phenomena. The analogy takes the 
following form: The clarHy that indeterminant ideas derive from rumination 
and discussion parallels the reduction of uncertainty in a parameter of a 
submicroscopic system that accompanies its quantum measurement, Walter sug- 
gests that wi th an allowance for quantum-like brain states, brains can be 
classified as physical symbol systems—processors that read, write, store, and 
compare symbols— of the type described by Newell and Simon (Newell, 1 y8l ; New- 
ell & Simon, 1976; Simon, 1981). 

As a revealing aside (developed more fully in Walter, 1980), Walter 
(1983) asserts that both scientists 1 theorizing about perceiving and animals 1 
perceiving are largely story-telling. His implication seems to be that we in- 
vent fictions that may or may not pertain to what is really going on but, at 
least, help us muddle through our labor, tories and our environments. 
Scientists fashion explanations (in a manner of speaking) in an attempt to 
sort out reaction times, thresholds, and so on, while perceivers contrive 
hypotheses to sort out patches of color, horizontal lines, and so on. The 
story's relation to reality i3 inconsequential as long as it is useful, where 
useful seems to be read as leading to the next (preferably consistent) fic- 
tion. If a fiction loses its usefulness to scientist or perceiver, it can be 
replaced with a new one — no more real but, ideally, more useful. 

As he rightly points out, Waller's position is in conflict with ecologi- 
cal realism. Beyond that assessment, however, whatever it is that Walter 
describes as ecological realism bears little resemblance to the framework 
carved out by Gibson over some 30 years (e.g., Gibson, 1966, 1979, 1982) and 
elaborated by others (e.g., Michaels & Carello, 1981 ; Reed & Jones, 1982; Shaw 
& Turvey, 1982; Shaw, Turvey, & Mace, 1983; Turvey & Carello, 1981; Turvey, 
Shaw, Reed, & Mace, 1981). I.i what follows, we shall point out where Walter 



^ Cognition and Brain Theory , 1984, 7, 247-261. 

tState University of New York at Binghamton 
ttAlso University of Connecticut 
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missteos in his treatment of realism, clarify our conflict with his strategy, 
and elaborate our own strategy for modeling behavior at the ecological scale 
of animal-environment systems (see below). In so doing, we shall attempt to 
show that Walter's posture on realism, while understandable in the beleaguered 
heroine of Tennessee Williams 1 Play, is less sympathetic in a (reasonably con- 
tent) scientist. 



ERLC 



Alternative or Contradictory Descriptions Do Not Deny Realism 

While Walter's discontent with ecological realisu includes our neglect of 
quantum-like brain phenomena, he sees the existence of fictions — be they 
scientists' oft-changing models of the world or animals' deceptive behavior in 
times of danger or play — as a more fundamental difficulty because they belie 
the claim that reality can be apprehended. 

The pervasiveness of fictions, deception, play, and so on, make the 
whole ideology of "realism" seem rather unlikely to me, as a 
productive model for mammalian nervous systems. A notion of useful 
fictions ("useful" perhaps to be defined in neo-Darwinian terms) 
seems more likely than either ecological, or naive, realism, to 
yield an adequate description of this most complicated organ system, 
(p. 233) 

Not surprisingly, we do net agree with this evaluation of the ramifications of 
such phenomena. First, dubbing them "fictions" is inaccurate and misleading. 
And, second, it is unlikely that fictions, with the suggestion that the at- 
tainment of goals is accidental, could ever be reliably useful. Let us 
elaborate this argument. 

The notion that science engages in the fabrication of useful fictions has 
a parallel in legal practice (Walter, 1980). Just as it is convenient but in- 
correct to conceive of a corporation as a single person in certain legal cir- 
cumstances so, too, is it useful but fictitious to conceive of space as 
Euclidean in some circumstances and curved in others. Walter claims that sci- 
ence would be better served by acknowledging that its models, however useful, 
are fictions "because the inconsistencies between scientific views of 'reali- 
ty' in different contexts will be more damaging" (Walter, 1980, p. R366). 

But do the seeming contradictions entailed by different characterizations 
of space, for example, remove all characterizations from the realm of reality 
(unqualified by quotation marks)? In other words, if a given notion changes 
relative to changes in the problem of interest, does this relativity preclude 
a consideration of that notion as objective and real? We have argued else- 
where that it does not and, indeed, that the concept of an absolute reality 
that would be appropriate for all grains of analysis is untenable (Gibson, 
1979; Michaels & Carello, 1981; Shaw, Turvey, & Mace, 1982; cf. Prigogine & 
Stengers, 1984, chap. 7). 

Appropriateness is the key idea nere — the level of description of reality 
must be commensurate with the level of inquiry, that is, with the type of sys- 
temic interaction that are of interest (c*\ Roson, 1978). Although Walter 
(1980) says, "When making human-scale measurements, for example, precision 
seldom requires us to incorporate either relativistic space curvature cr 
super-spacelike microtopological fluctuations" (p. R367), it is not disem- 
bodied "orec is ion" that renders such analyses unnecessary. Rather, those 
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analyses are inappropriate because human activities ^do not occur at those lev - 
el? . Human (and animal) behavior occurs with reference to the animal-specif- 
ic, activity-relevant properties of the environment — what Gibson has termed 
affordances (1979). Affordances, it is proposed, are the appropriate level of 
description of reality for the ecological scale. The lengthy, difficult 
search initiated by Grinnel (1917) and Elton (1927) to find a systematic and 
evolutionarily corsistent way to define the econiche — the related environmen- 
tal realities supporting a given species' lifestyle — has begun to focus on the 
view cf the econiche as an affordance structure (Alley, in press; Patten, 
1982). 

Affordances are both relative — they are defined with reference to a 
particular animal — and objective — they are defined by persisting properties of 
the environment. As an example, consider a brink in a surface. For an animal 
of a given size, that brink affords stepping down; for an animal of a given 
smaller size, that brink affords falling off. The reality of that particular 
layout of surfaces as a step-down place or a falling-off place is relative to 
the animal. Yet the nature of those relative realities is determined by the 
independent character of the surface layout— for example, that it is comprised 
of vertically separated substantial surfaces rather than liquid ones. This 
echoes a point made by Lewis (1929): 

Relativity is not incompatible with, but requires , an independent 
character in what is thus relative. And second, though what is thus 
relative cannot be known apart from such relation ... all such rel- 
ative knowledge is true knowledge of that independent character 
which, together with the other term or terms of this relationship, 
determines this content of our relative knowledge, (pp. 172-173) 

The coexistence of contradictory descriptions of reality (e.g., 
step-downable vs. not step-downable, curved vs. Euclidean space) dees not mean 
that these descriptions are fictions (cf. Ben-Zeev, in press). It dimply 
means that different problems appeal to different aspects of reality. No one 
description is universally privileged (cf. Alley, in press; Rosen, 1978). 
Indeed, contrary to Walter's efforts to marshal quantum phenomena in 
opposition to realism, the same point has been made for that domain by 
Prigogine and Stengers (1984): 

The irreducible plurality of perspectives on the sama reality 
expresses the impossibility of a divine point of view from which the 
whole of reality is visible (p. 224). The real lesson to be learned 
from the principle of complementarity [italics added] a lesson that 
can perhaps be transferred to other fields of knowledge, consists in 
emphasizing the wealth of reality, wh'oh overflows any 3ingle lan- 
guage, any single logical structure, (p. 225) 

Biased by his concern about what scientists do when they theorize about 
the world, Walter is confused in his attitude toward what animals (including 
humans) do when they perceive thei^ environments. He claims that the fictions 
by which scientists think they understand the universe have parallels in those 
cases where perceivers are duped by deceptions. We have already argued that 
scientific model-; of natural phenomena need not be considered fictions, even 
if models of the same phenomenon at different levels are inconsistent. But 
surely there are scientific models that are Just plai/ wrong — phlogiston, 
aether, and spontaneous generation, to name a few , Do these speak to the 
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possibility of percpivers knowing reality? They do not because they involve 
issues of scientific realism, not perceptual realism (see Biackmore, 1979). 
That is to say, the question of whether or not scientists can be successful in 
understanding nature is independent of whether or not perceivers are success- 
ful in knowing the environment as* it constrains their day-to-day activities. 
Scientists can flounder for any number of reasons — religious dogma bad 
experiments, stupidity — but for animals to "move so they can eat, and eat so 
they can move" (Iberall, 1974) and thereby survive, they must be in contact 
with the facts of their environments. Animals cannot act effectively with re- 
spect to fictions. 

What of Waiter ' s contention that the fictions are usei ul? uoe3n' t that 
empower them to guide activity? It is not at all clear how a fiction, unfet- 
tered as it is by actual states of affairs, could ever be useful. What guides 
the construction of a fiction so that it i-s at least relevant to an intended 
action — for example, a given layout of surfaces is fictionalized as being in 
the realm of stepping (on) or falling (off) rather than swimming (in), squeez- 
ing, eating, ad infinitum? And by what criterion might a gi/en fiction be 
deemed usefi 1? There must be some standard of comparison. If the actual 
state of affairs provides the comparison, realism cannot be avoided. 

Decep tion .vesuppose s Realism 

Walter's example of deceptive animal behavior might seem tailor-made for 
a fiction framework. A mother bird saves her offspring by feigning injury so 
that a fox will follow and attack her in the mistaken belief that her broken 
wing will prevent her escape. She has created a fiction — the predator per- 
ceives an injury that does not exist — that is useful in preservin/j her 
species. Such circumstances are quite rare in nature, however; not alx 
animals engage in deception, and, for* those that do, deception constitutes a 
small part of their behavioral repertoires. Deception provides a disputable 
foundation, therefore, upon which tc build an account of perceiving. Nonethe- 
less, we would emphasize the lawful basis that a' lows the mother to enact a 
successful charade and the fox to act upon it. She must constrain her 
musculature in just that way that will produce postural and joint adjustments 
specific to a particular dynamic condition (viz., iraterial scructure too weak 
to support the characteristic wing movement). For his part, the fox must de- 
tect the dynamics that underlie the bird's kinematic display. In order to 
pursue a realist basis for deceptive behavior, we will elaborate this 
so-called kinematic specification of dynamics (or KSD) principle (Runeson, 
1977/1983; Runeson & Frykholm, 1983). 

The principle starts with the reasonable assumption that, because the 
body is composed of certain masses and length3 and types of joints, only cer- 
tain movements will be biomechanically possible. The biomechanics will also 
de^rmine what one must do to maintain balance and cope with reactive forces 
(those "back-generated" by the act of moving). The kinematic properties of an 
action (its variously directed motions, its accelerations and decelerations) 
are determined by the dynamic conditions that underlie it — the forces produced 
intentionally and unintentionally by the animal and those supplied by the 
surrounding surfaces of support. The KSD principle suggests that a reciprocal 
relationship also exists: The kinematic properties of acts are transparent to 
the dynamic properties that caused them. For an observer, this principle 
reads: The ambient optic array (see Gibson, 1979; Lee, 197^, 1976) is struc- 
tured by an animal's movements such that macroscopic qualitative properties of 
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the optic array are specific to and, therefore, information about, the forces 
that produced the movements. 

The principle finds support in experimental investigations of human move- 
ment perception that use Johansson's (1973) patch-light technique. This 
rretftodolop: entails limiting an observer's view of actors (i.e., people who 
engage in activities) to small lights that are attached to their major joints. 
When a person engages in some activity, a transforming pattern of lights is 
generated, Perceivers find this limited optical structure to be informative 
about a number of properties, including metrical (length jf throw of an 
invisible thrown object of unknown ma^s [Runeson & F» ykholm, 1983]), 
biomechanic (gender of a walker [Cutting, Proffitt, & Kozlowski, 1978; 
Kozlowski & Cutting, 1977; Runeson a Frykholm, 19833), and Kinetic (the weight 
of a lifted box [Runeson * Frykholm, 1981]). Importantly, Runeson and Fryk- 
holm (1983) have shown that perceivers are not easily fooled by actors' 
efforts to be deceptive. Despite atte- pts to fake the weight of a lifted box, 
observers not only perceive the real weight but are aware of the deceptive 
intention and the intended deception (i.e., what weight is being faked) as 
well. Similar results are found in attempts to be deceptive about one's 
gender (through gait and carriage in a variety of actions) — observers are 
aware of both real gender and faked gender. The point to be underscored is 
that an actor can structure light in ways that provide information about 
conditions that do not exist (see Cibson, 1966; Michaels & Carello, 1981; Tur- 
vey et dl., 1981, for realist accounts of this fact) while simultaneously (and 
unavoidably) providing information about conditions that do exist, and 
perceivers can be aware of octh. 

Runeson and Frykholm draw a parallel with the dual reality of pictures, 
especially as it has been described by Gibson: There is inforr ^tion about 
objects represented in the picture and information about the picture itself as 
an object. "The duality of information in the array is what causes the aual 
experience" (Gibson, 1979, p. 283). The possibility of dual awareness may 
speak to the dearth of true deceptions in nature. For very sound physical 
reasons, situations that lend themselves to single awareness deception are, 
contrary to what Walter seems to imply, difficult to manufacture and, in 
consequence, quite rare. Intraspecif ic threat and play behavior, on the other 
hand, are found thioughout the animal kingdom. But it seems to be a misnomer 
^o label these "deceptions" *.n the sense of trickery. Baboons who bare their 
teeth have not fabricated a fearsome weapon. They are suggesting that they 
would rather not use the ones they have. Chimpanzees who play attack-and-f lee 
are not deluded; they behave differently in true fight- escape circumstances 
(Loisos, 1969). Play provides an opportunity to learn about one's environ- 
ment, con^pecif ics, and one's own behavioral possibilities. 

We have argued that characterizing perception as useful fictions is inad- 
equate to explain behavior in natural circumstances. An ex^^nation of ef- 
fective behavior requires a realist framework with the animal-environment sys- 
tem as the unit of analysis. Walter, however, is skeptical of whether such an 
analysis is possible. We contend that his objection is ba**ed on an 
overevaluation of what can be distilled from brain state accounts and a 
misunderstanding of what "animal-environment system" means. We will ( *al with 
each of these issues in the next two sections. 



ERLC 



13 



5 



Carello & Turvey: C Vagueness and Fictions 



Brain States Are An Inadequate Basis For Ascribing Intentional Content 

Walter implies that any perspective that does not advert to observations 
of brain states cannot provide a dynamically useful formulation of behavior. 
However, he prudently avoids any discussion of how observations of brain 
states would yield the proposed useful formulation. Presumably, Walter's 
advocated observations or measurements of the brain — no matter how precise or 
vague those measurements may be — would provide only extensional descriptions. 
And, presumably, a physical or biological theory of the brain strictly con- 
sistent with such observations could only be extensional. At best, observa- 
tions of brain states, purely interpreted, would lead to an account roughly of 
the orm: In the context of functional brain organizations P and Q ? function- 
al brain organization R has the capacity of inducing functional brain organi- 
zation S. This would not be a dynamically useful formulat on of behavior. No 
matter how elaborate ana detailed such an extensional account becomes, it will 
never allow Walter to answer apparently straightforward questions about prosa- 
ic behaviors. For example, how does an outfielder know to charge in rather 
than retreat to catch a ball (Todd, 1981 )? Why does a child, on seeing a 
particular surface, initiate crawling rather than walking to traverse the sur- 
face (E. Gibson, 1983)? The important ingredient missing from the foregoing 
braJn-state based account of behavior is intentionality- 

A dynamically useful formulation of behavior grounded in observations of 
brain states requires minimally (1) a principled basis for individuating brain 
states, and (2) a principled basis for ascribing content to individuated brain 
states. The latter r3fers to the problem of systematically upgrading the 
extensional characterizations oi brain states to intentional characteriza- 
tions, ordinarily expressed by intensional statements (Dennet, 1969; Fodor, 
1981; but see Searle, 1983). The point is that without identifying the 
contents (the significances, the meanings, the message functions, the signal- 
ling functions, etc.) of brain states, the brain theorists view of brain 
function in relation to behavior is empty. The intentional characterization 
earns for the brain theorist the luxury of addressing the question of what the 
brain states are about . From what observations and on what grounds would an 
advocate of the explanatory power of brain states fashion intentional charac- 
terizations? Those characterizations ari c at and are the sine qua non of the 
ecological scale - animal-environment n s. 

Intentional characterizations shw ^ not be interpreted as referring to 
systemic states that are in addition to or separate from those extensionally 
characterized. Intentirnal characterizations usually comprise alternative 
(discrete, symbolic) scrjption3 of ^ system's states, descriptions that com- 
plement the extensional (contiguous, dynamical) accounts of how a system is 
doing what it is doing. I .ttee (e.g., 1973, 1977) has been foremost in 
identifying the problem of understanding how these two complementary modes of 
description of any conplex system can be treateu in a physically consistent 
way. The ecological approach to perception and action has been concerned sim- 
ilarly with the complementarity of intentional and extensional characteriza- 
tions (e.g., Carello, Turvey, Kugler, & Shaw, 1984), but it has been concerned 
more directly with elaborating the extensional basis for ascribing 
intentionality to states of the animal-environment system in a principled 
manner (e.g., Gibson, 1979; Kugler, Kelso, & Turvey, 1980, 1982; Turvey et 
al., 1981). This strategy has been chosen because the principled ascription 
of content to the states of a system rests ultimately on the accuracy and 
specific predictions of the extensional account of the system. As Dennett 
(1969) puts it: 
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The ascription of content is thus always an ex post facto step, and 
the traffic between the extensional and the intentional levels of 
explanation is all in one direction, (p. 86) 

To the extent that the extensional basis for a system's phenomena is 
underestimated and/or unknown, the intentional characterization of the system 
is likely to be ungrounded and fatuous; ordinary systemic states get ascribed 
near magical functions or powers (section below). And this latter statement 
identifies, in a nutshell, the danger and inadequacy of seeking an account of 
behavior, as Walter advocates, in observations limited to brain states. 

The Animal-Environment System as the Appropriate Unit of Analysis 

Walter focuses his attack on realism on Turvey and Carello (1981). He 
discusses the position thusly; 

This position claims that the joint situation of an organism and its 
environment is the only correct fundamental concept for brain/mind 
modeling... I regard their presumption that a state of the 
brain-and-environment nexus can be observed as a fatal flaw in eco- 
logical realism. In my view, the state of a mammal's brain cannot, 
in most situations, usefully be observed. . .without so severely 
interfering with that state, by vour observing. .. that the state will 
change in an unpredictable and uncontrollable way.... (p. 231) 

Interestingly, the word "brain" never appears in the Turvey and Carello manu- 
script. Indeed, eschewing brains as the appropriate entities to model for an 
understanding of psychological phenomena is at the heart of using ecological 
to modify our brand of realism. We are interested in how organisms (including 
humans) are able to perceive their propertied environments in a way that will 
allow them to behave effectively with respect to those environments. A 
runner — be it human, gnu, or cockroach — does net steer around representations 
or brain states; it avoids real obstacles and goes through real openings. 
Couching problems in such terms is not, as Walter claims, simply a 
"programmatic and descriptive phase" that ecological realism is going through. 
The "dynamically useful formulation of behavior" that Walter asserts is una- 
vailable from our strategy not only is found in a realist framework but, we 
wou]d argue, can only be provided by such a perspective. One of Gibson's 
favorite examples — the problem of controlled collisions in locomotion — will be 
used to buttress this argument. 

As an animal moves through a cluttered surround, it sometimes steers 
around objects, sometimes contacts them gently, and sometimes collides with 
them violently. In order to control encounters with the environment, activi- 
ty-relevant (dynamically useful) information must be available. This includes 
information specific to what is moving (e.g., the animal or the objects that 
surround it), direction of locomotion, obstacles and apertures in one's path, 
time to contact (if it should occur), and force of contact (if it should oc- 
cur). This information has been demonstrated by a number of investigators 
(e.g., E. Gibson, 1983; J. Gibson, 1979; Lee, 1976, 1980; Lishman & Lee, 1973; 
Schiff, 1965 ) to exist in what might be termed the morphology of the optic 
flow field (Kugler, 1983; Kugler & Turvey, in press; Solomon, Carello, & Tur- 
vey, ^984). We will highlight some of the findings here but for detailed ana- 
lyses, the reader should refer to the cited works. 
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Although the problem of distinguishing one's own movement from displace- 
ments of the surround has been a long-standing puzzle in orthodox accounts of 
perceiving, Gibson f 1 979 ) provided a simple solution, viz., global, smooth 
change in the optic array specifies egomotion, local discontinuous change 
specifies motion of an object in the environment. Moreover, one's direction 
of locomotion is al<=<o specified by the form of the optic flow field; Global 
optical expansion specifies forward movement ( where the focus of expansion 
specifies the point toward which one is moving) while global optical contrac- 
tion specifies retreat (where the focus of contraction specifies the point 
from which one is moving). If the apprupriate flow fields are generated, the 
appropriate actions will be constrained (e.g., in the face of simulated global 
optical expansion, a person will c^Ke postural adjustments backward to 
compensate for the perceived forward movement [Lishman & Lee, 19733; when con- 
fronted with local optical expansion, a person [or animal] will duck [Schiff , 
1965]). The same sort of analysis distinguishes obstacles from apertures: A 
closed contour is specified as an obstacle when there is a loss of structure 
outside the contour during approach; it is specified as an opening when there 
is a gain of structure inside the contour during approach (J. Gibson, 1979). 
Infants as young as six months will duck from approaching obstacles but try to 
look inside approaching openings (E. Gibson, 1983). 

If an animal wishes to steer around objects, it must move in such a way 
that optical expansion is centered in openings rather than obstacles. In or- 
der to contact objects (and to vary the force with which they are contacted), 
two more optical flow properties are needed. The inverse of the relative rate 
of dilation of a topologically closed region of the optical flow field (e.g., 
that structured by a wall) specifies the time at which a moving animal will 
contact that region. The derivative of the time-to-contact variable is infor- 
mation about the imminent momentum exchange: If it is greater than a certain 
critical value, the animal will stop short of contact; if it is equal to that 
critical value, the contact will be soft; if it is less than that critical 
value, there will be a momentum exchange and the contact will be hard (Kugler, 
Turvey, Carello, & Shaw, 1984; Lee, 1976, 1980). 

Notice that these properties do not exist in the animal or in the 
environment but are only defined for the anlmal-envlrorment system . The com- 
ponents of the system are not ruled by the indeterminacy that governs 
conjugate variables in quantum mechani?3. That is to say, an exact descrip- 
tion of one component does not mean that the other component cannoc be deter- 
mined. On the contrary, measuring one of the components in isolation not only 
fails to provide an understanding of che system but gives a misleading picture 
of the component that is being measured. This is the problem of overdecompos- 
ing a partial system from the total system that includes it (Turvey & Shaw, 
1979; cf. Ashby, 1963; Humphrey, 1933; Weiss, 1969 ). Although science 
requires decomposition to a certain extent in order to make its problems 
manageable, the parsing of systems oannot be done cavalierly. An unprincipled 
selection of a system in which a phenomenon is thought to reside may make the 
phenomenon appear capricious and compel the scientist to attribute magical 
powers or content to the partial system (Ashby, 1963; Turvey & Shaw, 1979). 
The appropriate grain of analysis, however, may reveal the law-governed 
determinacy that is unavailable in the partial system (Weiss, 1969). 
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For example, if we take a climber-stairway system (Warren, 1984) as an 
instance of an animal-environment system, several points can be illustrated. 
First, there is optical information for a category boundary for ac- 
tion — perceivers can see which of a variety of stairways (constructed with 
risers of varying heights) are climbable in the normal way (i.e., without us- 
ing hands or knees). Second, there is a perceptual preference for stairways 
that would be easiest to climb (as determined by measures of energy expendi- 
ture during climbing). Third, both of these relationships can be described by 
a method of intrinsic measurement, in which one part of given system (e.g., on 
the animal side) acts as a natural standard against which a reciprocal pert of 
the system (e.g., on the environment side) can be measured (Warren, in press; 
Warren & Shaw, 1981; cf. Bunge, 1973; Gibson, 1979). Thus, the critical r^ser 
height/leg ratio, indexing the action boundary, is .89 whereas the optimal ra- 
tio, indexing minimum energy expenditure, is .26. These ratios are the same 
for all climbers, short and tall. Finally, each of these ratios is a measure 
of animal-environment fit; each is an index of the state of that system. No- 
tice that, unlike Walter's quantum systems, the state does not change by 
measuring it and predictions are not invalidated by observations. For a given 
individual, if the ratio of riser height to leg is less than or equal to .89, 
the stair will be climbable; if the ratio equals .26, that stair will be (rel- 
atively) energetically cheap to climb. Those relationships do not change. 
And nowhere in this analysis is it suggested that ^rain states can be or ought 
to be observed. 

Brains tates Are Not the Touchstone for Tneorles of Knowing 

Walter would not deny that behaviors like stairclimbing ar3 observaole 
without interference from the observer but he would, no doubt, claim that they 
are not useful or worthwhile to model. 

I have (Walter, 1980) characterized those aspects of behavior that 
are predictable from less severely interfering observations, as 
rather gross and physicalistic (contrasted with ''psychodynamic") ; 
they seem to obey a correspondence princ iple or classical limit. 
They also tend toward conspiring to give a systematically misleading 
impression. . .chat they are a closed system, adequate to describe the 
brain, (pp. 231-232) 

Though "gross" may be used pejoratively, perceiving and acting are unabashedly 
macrophenoraena. Walter's implication that the only interesting behavior is a 
microbehavior will sever him from consideration of a gannet's dive for a fish 
(Lee £ Reddish, 1981), the baseball fielder's catch of a deep fly ball (Solo- 
mon, Carello, & Turvey ; 1984; Todd, 1981), and his own efforts to avoid 
destruction on the San Diego Freeway (Gibson & Crooks, 1938). While 
microphenomena may have their place, that place is not a privileged one. They 
need not and will not serve all of science. Once again, this attitude is not 
idiosyncratic to ecological realists. Rosen (1978), for example, in stressing 
the functional and organizational character of certain physical systems, ob- 
served : 

"what seemed to bo emerging from such considerations was apparently 
the antithesis of the reductionist program: instead of a single 
ultimate set of analytic units sufficient for the resolution of any 
problem, we find that distinct kinds of interactions between systems 
determine new classes of analytic units, or subsystems, that are ap- 
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propriate to the study of that interaction, (p. xvi) 
[These] families of analytic units, all of which are equally "real" 
Care] entitled to be treated on the same footing; the appropriate 
use of natural inter ^ ; ^ns can enormously extend tne class of phy- 
sical observables [ita^^o added] accessible to us.... (p. xvii) 

Once again we see the theme of appropriate levels of reality, this time 
directed at the question of what counts as an observable for physics. 

We suspect that Walter would not be sympathetic to the above line of 
argument, countering that we ought to focus on what qualifies as a legitimate 
observable for psychology, instead of physics, for problems of knowing. This 
is apparent in his contrasting "physicalist ic" with "psychodynamic" aspects of 
behavior, charging that the former are not "adequate to describe the brain." 
This is where his emphasis on vague states of human mind during thinking, 
rumination, and the like clashes most dramatically with our concern for the 
very unvague states of animal-environment systems during perceiving and act- 
ing. In his desire to understand brain (as the seat of mind), Walter holds 
thinking and, in particular, vague thinking as the focus of any theory of 
epistemic agents. But for us, reliable and reproducible behaviors must be the 
touchstone for any account of knowing. In infinitely varying settings, 
organisms are able to produce the same appropriate behavior consistently, 
adaDting it to the ^articular circumstances. For example, countless times a 
day a bird will take off from a variety of surfaces of support at a wide range 
of heights and fly toward other surfaces of support at varying distances away, 
alighting on them gently. Sometimes it will steer around trees or pet cats 
and sometimes it will have a direct flight. Obstacles to and paths for 
locomotion and the appropriateness of accelerations and decelerations can be 
neither indistinctly specified in optical flow fields nor unreliably detected 
if the bird is to locomote through its cluttered terrain successfully. It is 
these kinds of behaviors, not indeterminate contemplations, that should pro- 
vide the standard against which to judge the adequacy of theories of Knowing. 

The example of " bird in flight is an important one because it contains 
one feature — collis ons with plate glass windows — of the sort that Walter, 
among others, uses to try to refute realism. The style of the argument can be 
characterized as follows: A bird who sees the window as an opening and flies 
into it nas not perceived reality correctly and has not acted effectively. 
But in situations of so-called perceptual "mistakes," we embrace the distinc- 
tion drawn by Lewis (1929) — ignorance of reality is not to be equated with 
erroneous knowledge of reality. A window does not structure the optic array 
at all points of observation so as to specify the substantiality of the trans- 
parent surface. The bird is ignorant of that aspect of reality because infor- 
mation about that aspect is not available to those points of observation along 
the bird's approach. Information about substantiality is available, however, 
to other points of observation, viz., on those paths where the optic array is 
structured by more reflective angles of the glass. When information about an 
obstacle to locomotion is not available, a bird will not change its path of 
locomotion. Perception in the first case is veridical; perception in the sec- 
ond case is "veridical but partial" (Lewis, 1929, p. 176). 

A Final Note 

The ecological approach addresses common behaviors under the general ru- 
bric of controlled collisions (Kugler et al., 1 98 4 ) o^ controlled encounters 
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(Gibson, 1979). Such behaviors cut across species and allow us to highlight 
the very small number of design principles responsible for the wide range of 
activities that nervous systems support. While the processes that thinkers go 
through in conceiving and refining their ideas are intriguing, they should not 
provide the starting point for an explanation of perception in the service of 
activity. Putting them at the forefront of things to be explained is an 
apotheosis of the exotic and likely to be premature. As a parallel, consider 
the rainbow, which has fascinated philosophers and scientists for centuries. 
An adequate quantitative theory that accounts for all of the features and 
quirks of that phenomenon awaited the development of geometrical optics, and 
an understanding of the wave and particle-like properties of light, polariza- 
tion, and the complex angular momentum method (Nussenzveig, 1977). We may 
have to be similarly thorough in uncovering those fundamental principles at 
the ecological scale on which the reliable and reproducible behav iors of 
epistemic agents are based and on which an acceptable account of thinKing will 
rest. 
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THE INFORMATIONAL SUPPORT FOR UPRIGHT STANCE* 

Claudia Carello.t M . T. Turvey,tt and Pecer N. Kuglerfft 



Nashner and McCollam suggest that (1) perturbations of the body relative 
to the gravitational field and the surface of support parse into a small num- 
ber of circumscribed kinetic states (regions of disequilibrium) f and (2) a 
functional muscular organization, to restore upright posture, corresponds to 
each state. Though the authors talk about the sensing of these states, they 
give no indication of the relevant information. In a related way, we think, 
their references to neural signals that require interpretation, their appeals 
to memory (presumably of previous trajectories, previous initial conditions, 
previous sensory consequences, and previous postural achievements), and their 
supposition of anatomically defined senses uniquely tied to distinct frames of 
reference seem to run counter to the general Bernsteinian (1967) strategy that 
they are pursuing, that is, compressing in a principled fashion a movement 
problem of potentially very many degrees of freedom into a movement problem of 
very few degrees of freedom. 

In contrast, we are inclined strongly toward Gibson's (1966, 1979) revi- 
sion of the senses in terms of perceptual systems — active, interrelated sys- 
tems (as opposed to senses) that detect information (rather than have sensa- 
tions) about the perceiver-environment relation (rather than about their own 
spates). Taking a Gibsonian stance, we ask whether there could be information 
specific to a circumscribed disequilibrium state, regardless of etiology; 
whether there could be information specific to approaching a region's bound- 
ary, regardles-3 of the details of the trajectory; and whether such information 
can be independent of the mode of attention. Wo will start with Gibson's 
strict interpretation of information with respect to vision, demonstrate that 
equivalent information is obtainable by other perceptual systems, and conclude 
with ^peculation about properties that might generalize to the control of 
stance. 

Information is optical structure lawfully generated by the persistent and 
changing layout of surfaces and oy the displacements of the body (as a unit 
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Nashner, L. M. , & McCollum, G. The organization of human postural move- 
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relative to the surface layout and as parts relative to each other). Because 
the properties of the optic flow field are lawfully related to the properties 
of the kinetic field underlying them, they are said to spec ify those kinetic 
properties (see Runeson & Frykholm, 1983). Following Lee (1978), the optical 
flow field is ex ter ospecif ic (specific to properties of surface layout), 
expropriospec if ic (specific to the orientat ional displacements of the point of 
observation relative to the surface layout), and propriospec if ic (specific to 
the relations among the parts of the body). And it can be specific in each of 
these ways simultaneously. How can this be? Each class of facts (extero, 
exproprio, proprio) imposes a distinct patterning — or structure, or form, or 
morphology (see Kugler & Turvey, in press) — on the optical flow field. These 
patternings are superposed on each other but dif ferentiable from one another. 

Consider one such patterning. An optical flow field can be treated as, 
roughly, a velocity vector field where the vectors represent angular 
velocities of the optical elements (see Gibson, 1979). When all vectors are 
undergoing a graduated magnification about a fixed point, then the point of 
observation is displacing rectilinear ly toward the fixed point. It is sug- 
gested that any globally smooth velocity vector field specifies a displacement 
of the point of observation. (Note that the qualitative macroscopic proper- 
ties of the field are what matter, not the individual vectors.) One can 
sketch a law at the ecological scale (see Turvey, Shaw, Reed, & Mace, 1981) 
roughly of the form: 



displacement of point LAWFULLY GENERATES globally smooth velocity 

of observation > vector field 

This law defines a particular kind ot information in Gibson's spec if icational 
sense, that is, 

globally smooth velocity SPECIFIES displacement of point of 

vector field > observation relative to surround 



Note that the optical property in the foregoing law is a kinematic 
abstraction (dimensions: length and time) of an energy distribution (light) 
structured by properties of a kinetic field (dimensions: mass, length, and 
time), that is, the field determined by the animal and surface layout. Inso- 
far as the same kinematic abstraction could be supported by other energy 
distributions modulated by the same kinetic facts, this analysis can be gener- 
alized to other modes of attention. For example, if a sound field with the 
same globally smooth morphology could be produced, according to Gibson's 
law-based/specif icational interpretation of information, listeners should per- 
ceive themselves displacing relative to the surroundings (for confirming evi- 
dence, see Dodge, 1923; Lackner, 1977). Defining this morphology over defor- 
mations cf the skin should yield the same impression of egomotion (again see 
Lackner, 1977). 

This treatment of expropriospecif ication can be extended to extero- and 
proprJospecif ication. It is suggested that distinct flow morphologies, now 
discontinuous rather than smooth, specify facts of surface layout and rela- 
tions among joints (Gibson, 1966, 1979). Again, these morphologies can be in- 
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stanced by different kinds of energy distributions. Note that is possible to 
describe vestibular stimulation— we ights displacing in fluid-filled chambers 
relative tr cavity's pull--and haptic-somatic stimulation— nonrigid mechani- 
cal defon. .i'-ns of the body's tissues— as kinematic or vector fields. And 
note further that, m principle, these velocity vector fields are characteriz- 
able alternatively as low-dimensional, macroscopic patternings. According to 
the ecological law formulation from aoove, if a given disequilibrium state 
Zives rise to identical morphologies in the vector fields that are "attended 
to" vestibular ly, haptically, and visually, then the same postural fact will 
be apprehended by each mode of attention. 

Nashner and McCollum are puzzled by neural signals hav ing equivalent 
postural consequences when the signals are different. In our view, their puz- 
zlement is based on the wrong formulation: Information may be identical when 
neural signals, stimuli, etc., are different (see Gibson, 1966, p. 55). 
Nashner and McCollum feel that neural signals must be interpreted. Signal is 
a metaphor for sensations, and sensations strictly speaking can only be about 
states of nerves; hence the neea for interpretation. Again, their formula is 
suspect. Information is about, in the sense of specific to, animal-environ- 
ment facts. It needs to be detected, and its differentiation and pick up by a 
perceptual system improve with practice, but to interpret it would be 
superfluous. 

We have suggested that the information about kinetic conditions (such as 
regions of postural equilibrium) is to be found in the morphology of kinematic 
fields. Moreover, the information is indifferent to the medium that has been 
structured kinematically . We conclude with a speculation about the morpholog- 
ical property specific to approaching a region's boundary — a generalization of 
the time-to-contact variable, T, and its derivative (Lee, 1980). 

For the visual system, T is the inverse of the relative rate of dilation 
of, roughly, the optic array. It specifies when one will contact a surface on 
the path of locomotion. Its derivative specifies how hard the imminent colli- 
sion will be (Lee, 1980). Our conjecture is that T may be a very general 
property of kinematic (flow) fields. Any kinetic field will have, as a rule, 
the equivalents of contactable "surfaces"; for example, attractors, basins, 
etc. Is there, as a rule, the equivalent of T in the kinematic abstraction of 
any kinetic field— for example, nonrigid mechanical distortions of body 
tissues? Suppose that the authors 1 regions of equilibrium are detected hapti- 
cally. Then the proposed availability of T and its derivative would provide a 
principled haptic basis for regulating forces to prohibit crossing regions. 

In sum, Gibson's treatment of information seems relevant to Nashner and 
McCollum in this sense: The low dimensionality of postural control they prom- 
ise on the side of action could be reciproca 3d (as it must) on the side of 
perception. 
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DETERMINING THE EXTENT OF COARTICULATION: EFFECTS OF EXPERIMENTAL DESIGN* 



Carole E. p -elfer,t Fredericka Bell-Berti , tt and Catherine S. Harnst 



Abstrac t, Substantial differences in the reports of the extent of 
anticipatory coarticulation have 4 nac^ the task of deciding among 
unifying models of the process difficult. Two conceptuallly dis- 
tinct groups of theories of coarticulation have emerged, one posit- 
ing the migration of articulatory features to preceding segments and 
the other positing th temporal cohesiveness of the components of 
segmental articulations. In studies of anticipatory lip rounding, a 
possible source of the differences reported in its extent prior to a 
rounded vowel is that ti.3 alveolar consonants commonly mployed in 
these studies are presumed to be unspecified with regard to lip 
configuration. Thus, the presence of EMG activity and/or protrusive 
lip movement during these consonants has been presumed to indicate 
vocalically conditioned iip activity. However, if this activity is 
directly related to the production of the consonar>t( s) , then the 
interpretation of these results is oroblematic unless the experimen- 
tal design allows for the differentiation of consonantal and vocalic 
effects. We offer here both data suggesting the need for such 
considerations and n paradigm that takes there considerations into 
account. 

Ir^ro^uction 

The phenomena of anticipatory coarticulation have generally been presumed 
to reflect unuer lying aspects of speech motor control (e.g., Kozhevnikov & 
Chistovich, 1966; MacNeilage, 1970). 1 However, substantial differences in re- 
ports of the extent of anticipatory coarticulation make difficult the task of 
providing one model to account for these data. Two types of conceptually dis- 
tinct theories of anticipatory coarticulation exist, both of which attempt to 
explain the apparently nondiscrete nature of speech output despite a presumed 
discrete inpuc. According to one type of theory, upcoming phones are scanned 
for salient features, which then migrate to as many antecedent phones as are 
neutral for, or in no way antagonistic to, the migrating feature (e.g., Dani- 
loff & Moll, 1968; Henke, 1966; Kozhevnikov f- Chistovich, 1966; Sussman & 
Vastbury, 1901). Thus, given some number of consonants unspecified for lip 
configuration immediately preceding a round a/ 1 vowel, these models predict that 
rounding will vary in its onset in direct proportion to the number and/or 
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duration of preceding segments. For example, benguerel and Cowan (197*0 
reported that upper lip protrusion (in anticipation of a rounded vowel) begins 
as early as the first consonant in clusters of as many as six consonants. 
However, the second type of theory proposes that the observed co-occurrence of 
components of proximate segments results, not from feature migration, but from 
the overlapping of articulatory components of those segments (e.g., Bell-Berti 
& Harris, 1981, 1982; Fowler, 1980). Thus, in the absence of conflicting de- 
mands, the onsets of different components of the articulation of a given phone 
will bear a stable temporal relationship tv each other. For example, Engs- 
trand (1981) reported that lip protrusion activity for the rounded vowel /u/ 
occurs at a relatively fixed time before the onset of voicing for that vowel, 
regardless of the number of preceding consonants. 

Despite their conceptual differences, however, a basic premise, having 
its roots in traditional linear generative phonology, is common to these 
models: namely, that a phone is neutral (i.e., unspecified) for a particular 
feature when that feature is not essential to its realization (Chomsky & 
Halle, l96u, pp. 402-403). Consequently, when activity associated with a giv- 
en feature occurs during a segment that is "neutral" for that feature, that 
activity must be associated with another segment, and the time at which this 
activity begins is then assumed to reflect . extent of anticipatory 
coarticulation. In fact, however, it may be that feature descriptions are 
incomplete. For example, as Benguerel and Cowan (1974) have noted, American 
English Ivl is commonly produced with lip protrusion, although thio protrusion 
often goes unmentioned in articulatory descriptions of Ivl . 

Upon closer consideration, it would appear that many of tl^e differences 
in the existing literature might be reconciled, and thus allow the development 
of a single explanation for them, were these assumptions reconsidered. The 
work presented here is part of a study designed to account for the conflicting 
results of previous studies, and therefore to test the predictions of the dif- 
ferent models of anticipatory coarticulation. 

Methods 

The alveolar consonants Itl and /s/, whose articulation would be presumed 
to be neutral for lip constriction, were combined to form nine sequences de- 
signed to vary both in the number of consonants and in overall sequence dura- 
tions. 2 The vowels in these utterances were HI and /u/, where V 1 was always 
/i/, while V 2 was either HI or lul . Thus, there were two vowel conditions, 
the /iC u / and /iC i/ conditions, eacl'i occurring with the nine different 
consonant string combinations, for a total of eighteen utterance types (Table 
1). The sequences were made by combining "words," and were presented to the 
subjects in orthographic writing. The subjects were instructed to speak at a 
comfortable rate, in a conversational manner, without undue attention to mark- 
ing word boundaries. Thus, the subjects could, and did, differ in the way in 
which they executed a given sequence (for example, leased tool (/list#tul/) 
was often realized as the sequence [list:ul]). Two native speakers of Ameri- 
can English 5 produced between fifteen and twenty repetitions of each of the 
eighteen VC n Vs, spoken within the carrier phrase "It f 3 a again." 

Surface electromyographic (EMG) recordings (Allen, Lubker, & Harrison, 
1972) of orbicularis oris inferior (001), right and left, were made simultane- 
ously with lip movement recording. Lip movements were tracked with an 
optoelectrical tracking system (Capstan Co. Model 400 Optioal Tracking System) 
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that sensed the position, in both the x and y planes, of an infrared 
light-emitting diode (LED) positioned on the lower lip. All data were 
simultaneously recorded on a 1 ^-channel FM ^ape. 



Table 1 

Consonant Strings: Number and Duration 

Consonant String Duration 

Number of (in milliseconds) 

Utterance Consonants TB CH_ 

i#tu 1 75 68 

i#su 1 220 160 

i#stu 2 245 152 

is#tu 2 230 163 

is#su 2 300 238 

is#stu 3 305 253 

istftu 3 280 266 

ist#su 3 385 331 

ist#stu 4 360 355 



i#ti 


1 


83 


71 


i#si 


1 


227 


160 


i#St< 


2 


240 


136 




2 


230 


165 


is#si 


2 


335 




i3#3ti 


3 


J31 


245 


istfti 


3 


284 


272 


ist#3i 


3 


391 


330 


ist#sti 


4 


392 


337 



The EMG signals were rectified, and both the EMG and movement data were 
integrated and then digitized using a PDP 11/45 computer. The durations of 
the consonant strings were measured for each token of each utterance type, us- 
ing a PCM waveform-editing program. The beginning of the consonant string was 
defined as the point at which either the frication appeared in the waveform 
(in consonant strings beginning with /s/), or the higher formants disappeared 
from the waveform (indicating the onset of closure in consonant strings begin- 
ning with /t/). The point in the acoustic signal corresponding to the release 
of the consonant occlusion immediately preceding V 2 was identified as the end 
of the consonant string and served as che acoustic reference, or line-up, 
point for subsequent ensemble averaging. Thus, when V 2 was preceded by /t/ f 
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the line-up point was the burst; when V 2 was preceded by /s/, the line-up 
point was the end of frication before the secono vowel. 

The beginning of 001 activity associated with the /VC n y/ sequences was 
determined by identifying the time at which the EMG activity increased to a 
level equivalent to the baseline plus five percent of the difference between 
the baseline and the peak EMG levels. The beginning of the related movement 
was determined by identifying tire onset of anteriorly-directed lip movement. 

Result s 

Some representative EMG data are shov, for each subject (Figure 1a). The 
EMG signals in each panel represent the ensemble average 001 EMG activity of 
an /iC n u/ utterance, with consonant string length (i.e,, both the number of 
segments and the durations of the sequences) differing across panels. The on- 
set of EMG activity occurs earlier as consonant string duration increases, so 
that it would appear that there has been a migration of lip rounding back to 
the beginning of the consonant string. In fact, when the onset of 001 EMG 
activity for each of the nine /iC n u/ utterances is plotted against the re- 
spective consonant string durations (Figure 1b), it seems that, for both sub- 
jects, these onsets bear an obvious relationship to consonant string duration. 
That is, they occur earlier aa string duration increases, with correlation 
coefficients of r-.98 and .97 for TB and CH, respectively. 

Although these results might be interpreted as evidence that lip rounding 
has spread to the beginning of the "neutral" consonant string, we believe that 
it is imperative to determine whether all of the EMG activity is actually 
vowel-related or, alternatively, if it reflects consonantal lip gestures. In 
other words, if the 001 activity during the consonant string is vowel-related, 
we would not expect to find such activity during the same consonant string 
when it is followed by an unrounded vowel. We therefore examined 001 activity 
for the minimally contrast! ve /iC n i/ utterances, samples of which are shown 
in Figure 2a. It is clear that, even within this unrounded vowel environment, 
there is a significant amount of orbicularis oris activity during the conso- 
nant string articulation. In fact, if we treat these /iC i/ data as we did 
those for the /iC u/ utterances, identifying the onset of EMG activity for 
each utterance and plotting these times against consonant string durations 
(Figure 2b), the resulting scatter plots are strikingly similar to those for 
the /iC u/ utterance set (Figure 1b). That is, 001 activity begins earlier 
as c nsonant string duration increases. (Subject CH produced only eight of 
the nine /iC i/ utterances.) Obviously, then, this EMG activity cannot re- 
flect the onset of vcwel-related lip rounding (i.e., the migration of the 
vowel feature) since the relationship between consonant string duration and 
the onset of 001 activity is observed in both rounded and unrounded vowel 
environments. Indeed, correlation coefficients are as high or higher for 
these /iC i/ utterances (r*.98 and .99 for TB and CH, respectively) than 
they are for their rounded counterparts. 

It is obvious, then, that the progressively earlier EMG activity must re- 
flect consonant-related e/ents. This is made more apparent when the EMG 
curves for the minimally contrast ive /iC n u/ and /iC n i/ utterances are su- 
perimposed (Figure 3). The two signals diverge in the vicinity of the acous- 
tic onset of V 2 , with a second peak of activity evident when V 2 is /u/, while 
EMG activity is suppressed when V 2 is /i/. However, because the EMG signal 
never returns to a baseline level prior to /u/, the onset of the /u/-related 
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Figure 1. Upper panels (1a): Ensemble-average EMG data for subjects TB 
(left) and CH (right) recorded from orbicularis oris inferior (001) 
for three /iC u / utterances. Lower panels (1b): EMG onset time 
(ms before line-up point) vs. consonant string duration for 
/iC u/ utterances. Time 0 represents the release of the conso- 
nant occlusion, determined from tne acoustic waveform. The arrows 
indicate the average of the acoustic onsets of the consonant 
strings. 23 
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Figure 2. Upper panels (2a): Ensemble- average EMG data for subjects TB 
(left) and CH (right) recorded from orbicularis oris inferior (001) 
for three /iC i/ utterances. Lower panels (2b): EMG onset time 
(ms before line-up point) vs. consonant string duration for 
/iC i/ utterances. Time 0 represents the release of the conso- 
nant occlusion, determined from the acoustic waveform. The arrows 
Indicate the average of the acoustic onsets of the consonant 
24 scrings. 
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Figure 3. Ensemble-average EMG datr for the two subjects, recorded from or- 
bicularis oris inferior (001) for three minimally contrastive pairs 
of /i C n v/ utterances . 
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Figure 4, 



Statistically determined point of separation ("EMG separation on- 
set 11 ) between minimally contrastive pairs of /iC v/ utterances 
vs. the average duration of the consonant sequences of the /iC u / 
utterances of each pair, 
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EMG activity was determined statistically as the time at which the difference 
(in microvolts) following the divergence of the two signals reached signifi- 
cance (£<.05). 

The statistically determined onsets of rounded vowel activity are plotted 
as a function of consonant string duration for the nine minimal pairs for sub- 
ject TB f and for eight minimal pairs for subject CH (Figure 4). In contrast 
to the consonant-related EMG activity (see Figures 1 and 2), these onsets bear 
no obvious relation to the durations of the consonant strings. h Rather, with 
the exception of the /i#tu/ utterance, they occur within a fairly restricted 
range, bearing a stronger relationship to the onset of the rounded vowel than 
to the onset of the consonant string. 

The EMG data thus show the following: First, for these two subjects, 
some lip activity appears to be inherent in the production of alveolar conso- 
nants. Second, the onset of EMG activity for /u/ appears to be related to the 
acoustic onset of that vowel, and not to the compatibility of the vowel and 
consonant articulations. Finally, even when there is lip activity for adja- 
cent consonants and vowels, they appear to be organized as independent ges- 
tures, as the separate peaks of 001 activity for the /iC n u/ utterances sug- 



Figure 5 shows movement data for both subjects, for the same /iC n u/ 
utterances whose EMG data are presented above (Figure 1a). For TB, the data 
show a substantial forward lip movement in the vicinity of the acoustic onset 
of the consonant string, a position that is then sustained through V 2 . Howev- 
er, while there is a less obvious separation between the consonant and vowel 
gestures in the movement than in the EMG records, there are troughs in the 
movement traces for all but the shor^st utterance. 5 For subject CH, the 
anterior lip movement associated v* „n the rounded vowel is more clearly 
separated from the anterior movement occurring earlier in the utterance. 

When the movement traces fcr the /i c n u/ and /iC i/ utterances a.^e su- 
perimposed (Figure 6), the pattern is the same as that for the EMG records. 
That is, regardless of the identity of V 2 , the curves are nearly identical 
through the consonant string, diverging in the vicinity of the onset of the 
second vowel. However, because of hardware limitations at the time of record- 
ing, the baselines for these data are not always aligned; 6 for this reason we 
were unable to determine statistically the times at which each minimally 
contrastive pair differed, as we had done for the EMG data. Furthermore, when 
the temporal relationships between the consonant-related EMC ano the earliest 
anteriorly directed movements are examined, there are clearly differences for 
the two subjects. For subject TB, the earlier onset of 001 activity is 
ssociated with consonant-related forward lip movement. That is, there is an 
a^oropriate contraction time interval between the EMG and corresponding move- 
ment (Figure 7a). For subject CH, however, the earlier 001 activity is not 
associated with any significant anterior lip movement for the consonant string 
(Figure 7b). Rather, this movement is associated wilh the first vowel. 

We are therefore faced with the question of what the consonant-related 
EMG activity means in terms of movement for subject CH. Figure 8 shows 001 
activity for the three representative /iC n u/ utterances, along with both the 
corresponding horizontal and vertical movement traces. It can be seen that, 
while the EMG and horizontal lip movements are poorly correlated ir the vicin- 
ity of the consonant string, there is a good temporal correlation (i.e., 
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Figure 5. Antero-posterior lip position as a function of time for the two 
subjects for three ^ c n u/ utterances. The arrows indicate the 
average of the acoustic onsets of the consonant strings. 




Figure 6. Antero-posterior lip posit ion for both subjects as a function of 
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Figure 7. Ensemble-average EMG and lip position data as a function of time 
for both subjects for three /iC n u/ utterances. 
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Figure 8. Ensemble-average EMG and lip position da* a for subject CH for three 

/iC n ii/ utterances. The thin line represents OOI-H data, the 
thick line anterior lip position data (lip X), and the dashed line 
28 vertical Up position data (lip Y). 
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* >ntraction interval) between the consonant-related EMG and vertical lip move- 
ment. Thus, for th*s subject, the same muscle appears to be contributing to 
both vertical movement (in the production of the consonant string) and 
horizontal movement (in the production of the vowel), differences in orbicu- 
laris ons function that have been noted previously (cf. O'Dwyer, Quinn, Guit- 
ar, Andrews, & Neilson, 1981). 

Discussion 

The daud offered here suggest that there are a number of reasons for the 
difficulty in reconciling the differences between sets of previously reported 
data on the extent of anticipatory coarticulation. One of these reasons re- 
sides in the unproven assumptions that, if a speech sound's articulation has 
not been described as including a particular gesture, then, first, that ges- 
ture has little, if any, consequence for the production of the sound and, sec- 
ond, that speech sound is "unspecified" for that gesture/feature. However, 
phoneticians have long known that the description of the articulation of 
speech sounds is incomplete (cf. Pike, 19^3 » P* 152); our data clearly indi- 
cate that, for some speakers at least, some alveolar consonants traditionally 
assumed to have no intrinsic lip gestures do in fact have such gestures as 
part of their natural production. Thus, the assumption that these consonants 
are neutral with regard to lip configu ion is untenable. 

These data also provide evidence of the complexity of the electromyo- 
graphic and kinematic data collected for studying coarticulation processes. 
First, it is impossible to separate active protrusion gestures from passive 
relaxation of lips that have been retracted, except by observing the activity 
of the muscles responsible for those protrusion gestures. Second, the EMG da- 
ta may more closely reflect the underlying segmental structure of speech than 
do kinematic data. For example, while we see no trough in the movement traces 
of the /i#tu/ utterance for subject TB, there are clearly separate peaks of 
001 activity for both the consonant and vowel segments, suggesting the segmen- 
tal nature of the underlying articulatory organization. 

In addition to providing insights into the causes of some of the apparent 
discepancies resulting from problems in experimental design, we would also 
suggest that another source of conflict in attempts to develop a single model 
of anticipatory phenomena stems from presupposing tnat the timing of the onset 
of rounding is ctn entirely anticipatory phenomenon. It is notable that in 
both this stuay and our earlle 10 work (Bell-Berti & Harris, 1981), the onset of 
vowel-related lip rounding is closer to the acoustic onset of the rounded 
vowel for sequences of the form /i#tu/ than for any other sequence. This re- 
sult might seem to provide some limited support for the feature migration hy- 
pothesis, if this sequence were compared with only one longer sequence (see, 
e.g., Sussman & Westbury, 1981). However, we believe that an equally plausi- 
ble explanation is that the result reflects the suppression of lip rounding 
until the first vowel can be completed without distortion. That is, the onset 
of rounding may be constrained by the carryover effects of a preceding 
(unrounded) vowel. Thus, in a sequence like /i#tu/, where the vowel-to-vowel 
interval is fairly short, the rounding onset might be delayed relative to oth- 
er sequences where the consonantal sequence occupies a longer time slot. In 
fact, Sussman and Westbury's (1981) observation of systematic differences in 
the onset of lip rounding as a function of the identity of the preceding 
unrounded vowel may be interpreted as evidence of the same carryover effect. 
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Summary 



These data were part of a study designed to account for conflicting re- 
sult? of previously reported studies by suggesting that at least some of the 
apparent discrepancy arises from experimental design. Because our two sub- 
jects produced alveolar consonants with significant orbicularis oris activity 
in both rounded and unrounded vowel environments, we were able to establish 
that those gestures that were variable in their onsets on both the EMG and 
movement levels were clearly tied to something that was acoustically variable 
as well — namely, the onsets of consonant strings of differing durations. We 
also observed separate consonant and vowel-related activity, as in the EMG re- 
cords of the /iC n u/ utterances, where there were almost always distinct 
peaks for each. Furthermore, our EMG data may be interpreted as reflecting a 
stable onset of lip rounding independent of consonant string duration, except 
for the case of the shortest consonant string. And, while the tendency has 
been to view all of these phenomena as reflecting only anticipatory coarticu- 
lation, we believe it more likely that they represent the combined effect of 
carryover and anticipatory processes. 
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Footnotes 

*We have limited ourselves here primarily to a consideration of anticipa- 
tory phenomena. This limitation was imposed because most theoretical discus- 
sions have focused on anticipatory coarticulation. 

2 The literature in this area contains two different indices to consonant 
string length: the number of consonant segments (e.g., Daniloff & Moll, 1968; 
Lubker & Gay, 1982) and the duration of the consonant sequence (e.g., 
Bell-Berti & Harris, 1974, 1982; Engstrand, 1981). Although these two meas- 
ures are related, the relationship is not isomorphic (see, for example, Table 
1). 

3 Subject TB is a speaker of educated Greater Metropolitan New York City 
English. Subject CH is a speaker of educated Central Florida English. 

"This result is compatible with results of other studies using subjects 
known to produce the alveolar consonants /s/ and /t/ without lip rounding 
(cf. Bell-Berti & Harris, 1982; Engstrand, 1981), although these studies 
clearly still subscribe to the possibility that alveolar consonants have 
inherently neutral lip specifications. 

5 The observation of "troughs" in EMG and movement records is not new 
(cf. Bell-Berti & Harris, 1974; Engstrand, 1983; Gay, 1977). The fact that a 
trough is absent in movement records when the intervocalic consonant is short 
may not reflect differences in gestural organisation, but, rather, biomechani- 
cal constraints that could influence the response characteristics of the lips. 
That is, with movement being rather slow relative to EMG activity, it is hard- 
ly surprising that the lips do not have time to protrude, retract, and pro- 
trude again for the rounded vowel during the 75 ms /t/ closure. 

6 We would note, however, that there was no consistent pattern of DC 
offsets between the /iC n i/ an d /iC u/ utterances, suggesting that these 
differences were independent of vowel rounding. 
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THE ROLES OF PHONEME FREQUENCY, SIMILARITY, AND AVAILABILITY IN THE EXPERIMEN- 
TAL ELICITATION OF SPEECH ERRORS* 



Andrea G. Levittt and Alice F. Healytt 



Abstract . In two experiments subjects read aloud pairs of nonsense 
syllables rapidly presented on a display screen or repeated the same 
syllables presented auditorily. The error patterns in both experi- 
ments showed significant asymmetry, thus lending support to explana- 
tions of the error generation process that consider certain phonemes 
to be "stronger" than others. Further error analyses revealed 
substantial effects of phoneme frequency in the language and effects 
of phoneme similarity, which depended on the feature system used to 
index similarity. Phoneme availability (the requirement that an 
intruding phoneme be part of the currently presented stimulus) was 
also important but not essential. We argue i;hat the experimental 
elicitation of errors provides critical tests of hypotheses generat- 
ed by the analysis of naturally occurring speech errors. 

Recent interest in speech errors has focused largely on the evidence such 
errors provide about levels of linguistic analysis and psychological models of 
the speech production process. For example, Fromkin (1971), basing her analy- 
sis on a corpus of naturally occurring speech errors, found evidence in sup- 
port of the independence of various levels of linguistic analysis, including 
both phonemes and phonetic features. On the other hand, Garrett (1980), also 
basing his analysis on spontaneous-error collections, examined speech error 
distributions for the constraints they provide about a model of sentence 
production. 
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The development of experimental techni M ues for the el Citation of speech 
errors (see, for example, Motley & Baars, 1976) provides a new source of data, 
which, when used in conjunction with the evidence from naturally occurring er- 
rors, greatly facilitates the modeling of speech error generation. As Fowler 
(1983) points out, the experimental elicitation of speech errors permits tape 
recording of subjects 1 responses sc that errors are less likely to be misheard 
or overlooked. Furthermore, experimental elicitation provides more thorough 
tests of hypotheses generated by the analysis of spontaneous error collec- 
tion: , especially when portions of the error pattern in the naturally occur- 
ring corpus are based on Relatively few examples. On the ^"her hand, there is 
always the danger of introducing influences in the laboratory that do not ap- 
ply in more natural settings. 

S'.attuck-Hu^nagel and Klatt (1979) analyzed collections of nat lly 
occurring segment substitution errors and contrasted two types «•? rcr 
generation explanations. In the case of the first type of explanation, 1 is 
assumed that sane segments are "strong" whereas others are "weak." Strong 
segments might be those that occur more frequently in the language, are ac- 
quired earlier, are unmarked in phonological theory, or are easier to 
articulate. The precise definition of segment strength is less important than 
the role strong segments play. Each segment substitution error has an intend- 
ed, or target, segment sourcs for the intruding error. The explanation 
predicts that strong segments appear more often as intrusions, whereas weak 
segments appear more often as targets in segmental substitution errors. A 
confusion matrix of iuch speech errors should tht.w be asymmetrical. This 
asymmetry would reflect the pattern of strength versus weakness of the seg- 
ments involved. 

In the case of the se« w nd type of explanation, on the other hand, the 
tendency of one segmen y) to substitute for another segment (x) would be 
related to their degree r similarity, but substitutions of x to y and y to x 
would be equally frequent. A confusion matrix of speech errors, if such er- 
rors arose as predicted by this type of explanation, should thus be symmetri- 
cal. 

Shattuck-Hufnat *1 ar.c! *latt (1979) analyzed the confusion matrix generat- 
ed by 1620 substitution errors. The matrix proved to be asymmetrical. Howev- 
er, further analysis revealed that the asymmetry was due almost exclusively to 
four consonant segments /s, S, fc, t/, sucn that errors of the type /s/ to /5/, 
/s/ to /8/, and /t/ to /5/ were all more frequent, respectively, than /5/ to 
/s/, /5/ to /s/, and /g/ to /t/. Orce this source of asymmetry was removed, 
the confusion matrix of segmental errors was no lenger significantly asymmetr- 
ical. However, the pattern o° errors for /s, 5, 5, t/, which contributed most 
to cne asymmetry of the matrix, could not be accounted for by stronger seg- 
ments intruding more often, since, according to Shattuck-Hufnagel and Klatt, 
/§/ and /6/, for example, are less frequent and acquired later than /a/ (i,e., 
they are weaker), yet they Intruded more often. 

Shattuck-Hufnagel and Klatt proposed to account for the asymmetrical pat- 
tern of thxr confusion matrix in terms of a palatalization mechanisn. They 
checked their corpus for factors that might "palatalize" tne pronunciation of 
a non-palatal consonant (e.g., /s/ becoming /§/), but no difference was found 
between the source consonant environments in which palatalizing and 
non-palatalizing errors occurred. When the vowel environments of th p target 
utterances were examined, Shattuck-Hufnagel and Klatt found that a palataliz- 
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ing error occurred proportionately more often before a high vowel (e.g., /i/), 
but that this difference was not statistically significant. However, their 
calculations were based on a relatively small number of observations. The ef- 
fect of the following vowel might indeed w e reliable given a larger number of 
obser vat ions. 

The authors concluded that the evidence from thexr data suggests that er- 
rors arise during the speech production process when one of two simultaneously 
available segments is mis-selected for a slot in an utterance, with the two 
segments generally being equally likely to be mis-selected. 

Notice, however, that an explanation assuming that phonemes are not equal 
in strength, in particular one for which a strong segment is defined as a more 
frequent segment in the language, 1 does not receive a Tair test in a corpus of 
naturally collected errors, because the prior probabilities of occurrence for 
all the segments are not equal. Imagine an explanation of the error genera- 
tion process according to which segment strength is defined by segment fre- 
quency and similar segments are likely to substitute for one another. Such an 
explanation would predict that the rate that a frequent segment would be 
mispronounced given that it was intended would be lower than the rate that an 
infrequent segment would be mispronounced given that it was intended. So, for 
example, for /s/ and /§/, similar segments that might easily be confused, with 
/s/ as the stronger because it is more frequent, the rate of /s/ being 
mispronounced given that it was intended should be lower than the rate of /§/ 
being mispronounced. But the collection of naturally generated speech errors 
reflects the frequency of occurrence of phonemes in English, not just the er- 
ror rates given that the phonemes are intended. Thus, since /s/ is much more 
frequent in the language than /§/, it will occur much more often as an intend- 
ed phoneme so that it will occur more frequently as a target than /§/, even 
if its rate of occurrence as a target given that it was intended is lower. 
Furthermore, /§/, which is likely to substitute for /s/ because it is very 
similar, will appear mors often as an intrusion than as a target, because of 
the high prior probability or frequency of /s/ as an intended phoneme. Note 
that the asymmetry arises because of the segmental similarity of /s/ and /§/ 
and a great discrepancy in their relative frequencies of occurrence in En- 
glish. An experimental ellcitatioi. o* errors using these segments in source 
utterances provides a good way of avoiding the problem of unequal frequencies 
of occurrence, because in the experimental situation, the intended utterances 
can be assigned equal prior probabilities. If frequency contributes to seg- 
ment strength and if strength is a factor in the error generation process, 
then /s/ should appear more often as an intrusion and /&/ more often as a tar- 
get, in the controlled experimental situation. 

In jitively, /s/ and /§/ seem cuite similar, but similarity between two 
segments has not been clearly defined in the speech en or context, although 
several investigr'ors (Fromkin, 1971; MacKay, 1970; Nooteboom, 1969) have dis- 
cussed the role of features in the error generation process. One way of 
defining segment similarity might be on the basis of the number of shared fea- 
tures. Clearly, the choice of a particular feature system can be crucial. 
Given a particular feature system, segments might need to share all or almost 
all features d only differ <n some single individual feature (e.g., anterior 
or high) or t} v i of feature (e.g., features for place of articulation) for er- 
rors to occur frequently. The role cf segment similarity can be assessed in 
two ways: 1) Does the similarity of two segments in an utterance affect the 
tendency of subjects to make errors on those segments and 2) Given that an er- 
ror has occurred, how similar is the intruding phoneme to its intended target? 
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Another issue is whether it is necessary for the target and intruding 
segments to be simultaneously available for a substitution error to occur that 
nvolves them. In a very broad view, the availability of a segment as an er- 
ror source should be a function of its frequency in the language. A narrower 
view might define segment availability such that the source of an error need 
occur within a relatively constrained portion of the intended utterance. One 
could assess this narrower view of availability experimentally by seeing 
whether substitutions of y for x are more likely to occur when y is part of 
the stimulus. 

Finally, it may be that Shattuck-Hufnagel and Klatt's observed asymmetry 
involving /s, 5, 5, t/ does reflect a palatalizing mechanism but there were 
insufficient observations in the environment of high vowels or palatal conso- 
nants. Again, the experimental situation permits a direct test of this hy- 
pothesis. 

The basic technique for ^ne experimental elicitation of speech errors in- 
volves what Baa* s (1980) calls the "competing plans framework." Essentially, 
the subject is given two alternative plans for the production of an utterance 
end is required to make a rapid response. For example, the subject might see 
the series of word pairs "give book, go back, get boot, bad goof" flashed 
rapidly on a screen. Notice that the fourth word pair, the test pair, "bad 
goof" involves a reversal of the initial consonant pattern found in the first 
thre pairs, the bias pairs. After the test pair, at the sound of a buzzer, 
the subject would be expected to say the now-occluded final pair as quickly as 
possible. Under these conditions, a number of subjects will produce a speech 
error and may even spoonerize the test pair, reversing the initial consonants, 
and say "gad boof" instead. 

We adapted this basic technique for the purposes of our study. Since 
previous work (Baars, Motley, & MacKay, 1975) has shown that there is output 
monitoring for the lexical status of spoonerized words (e.g., that "gad boof," 
which contains two non-lexical items, will occur less often as an error for 
"bad goof" than "darn bore," which contains two lexical items, will occur as 
an error for "barn door" in a similar sequence) , we chose pairs of nonsense CV 
syllables as stimuli. 2 In pilot work, we found that subjects tended to make a 
greater number of errors when they were asked to pronounce both the bias and 
test items than when they pronounced only the test items. Hence, we required 
subjects to pronounce all of the items flashed before them on a screen. 3 Fur- 
thermore, pilot work indicated that when the bias pairs had a consistent vowel 
pattern (e.g., compare the bias series "right lean, ripe leap, ride leak" with 
the one given above), more errors tended to occur than when the vowel pattern 
was inconsistent (see also Dell, 1984). Thus, we restricted our bia3 pairs to 
those with consistent vowel patterns. We created our CV stimuli from the four 
consonants in Shattuck-Hufnagel and Klatt's data base that had been responsi- 
ble for the initial asymmetry /s, S, t/, plus the additional consonant 
phoneme /0/. The addition of /©/ allowed us to test whether Similarity, de- 
fined as a single feature difference, depends on a specific feature, since the 
consonants in the pairs /s, 5/ and /Q, t/ differ on the single feature 
continua nt, according to Chomsky and Halle (1968), whereas the consonants in 
the pair /s, 0/ differ on the single feature strident . The consonant /Q/ also 
provides another relatively infrequent, but non-palatal phoneme to test 
again3t the infrequent palatal set /§, We chose the vowels /a, i, u for 
the test set, so as to be able to assess whether vowel height, high /i, u/ 
versus low /a/, or vowel height a nd frontness, front high /i/ versus /a, u/, 
might be the possible source of palatalizing errors. 
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Experiment 1 

In Experiment 1, pairs CV nonsense stimuli were presented visually, 
and subjects were asked to read all presented items as rapidly as possible. 

Method 

Materials . Using tne set of consonant phonemes /s, 5, 5, t, 0/, written 
as s, sh, ch, J,, and th, respectively, and the set of vowels /a, i, u/, we 
constructed pairs of CV nonsense syllables. Since we eliminated pairs with 
matched consonants (e.g., ta U_) as well as those with matched vowels (e.g., 
sa ta), there were twenty possible consonant permutations and six possible 
vowel permutations for a total of 120 test stimuli. A set of 120 filler pairs 
of CV nonsense syllables was analogously constructed using another set of con- 
sonants /r, 1, b, v, m/ and the same set of vowels /a, i, u/. 

Design . Each of the 120 test stimuli was preceded by three identical 
bias pairs of nonsenso syllables that were construcccd analogously to the test 
CV pair set and in which the order of the vowels was preserved but that of the 
consonants was switched. For example, for the test stimulus su t_i, the 
presentation order was _tu si, _tu ^si, tu si, au t^. In order to prevent sub- 
jects from anticipating a switch after three identical CV nonsense pairs, 3C 
of the test CV nonsense syllables were also presented as distractors in group" 
of four (e.g., tu si, tu si 9 tu si, tu si), 30 in groups of three, 30 in 
pairs, and 30 singly. The 120 filler CV nonsense pairs also served to divert 
subjects' attention from the test stimuli consonants and pattern of presenta- 
tion. Thirty of the filler CV nonsense syllables were presented in groups of 
four, 30 in groups of three, 30 in groups of two, and 30 singly. For half the 
trials with the filler syllables, the last item preserved the consonant order 
(e.g., ra U., ra l_i, ra U, ra H) and fo- half the trials the last item re- 
versed the consonant order (e.g., ra U, ra 1\, ra li, la rl_). The oresenta- 
tio^ of the test stimuli, disiractors, and tiller sequences was in pseudoran- 
dom order with the constraint that the n e were four test sequences, four filler 
sequences, and four distractor sequences in every block of twelve sequences. 
There was a total of 1080 pairs of CV non3ense syllables presented to sub- 
jects. 

Subjects . Thirteen men and women participated in the experiment. Four 
were volunteers from the Haskins Laboratories staff (who were relatively 
knowledgeable phonetically), and nine were Yale University undergraduates 
receiving course credit for their participation. (Five additional subjects 
[one volunteer and four students] were tested, but their data were not ana- 
lyzed because they failed to read a substantial number o the syllable pairs, 
and it was often not possible to determine what sylla ie pair they were 
responding to when they did uUer something.) 

Apparatus and procedure . The pairs of CV nonsense syllables were 
projected under program control onto the self-refreshing screen of a Decgraph- 
ic 11 GT-^0 computer terminal hooked up to a PDP 11M5 computer at the rate of 
two syllable pairs a second. Subjects were asked to pronounce each syllable 
pair aloud as accurately as possible. During this task, subjects listened to 
white noise presented over Grason-Stadler TDH 39~300Z headphones in order to 
encourage them to spe;^- up as loudly as possible and to minimize their ability 
to monitor their own utterances. Subjects' responses to the stimuli were 
recorded via a Sony F-27S microphone onto a .Sony cassette tape recorder model 
TC-110B for later analysis. 
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Subjects were told that the nonsense syllables they would see would be 
composed of three vowel sounds, spelled as i^, a, and u. They were instructed 
to pronounce the letter i as /i/ as in the word eat , the letter a as /a/ as in 
the word father , and the letter u as /u/ as in the word boot . They were also 
told to pronounce the letter pair th as in the word think , sh as in shoe , and 
ch as in ^hurch . Subjects were then shown CV nonsense syllable pairs 
typewritten on a sheet of paper and asked to read them aloud. Their 
pronunciation was checked, and if they d d not pronounce the letters as 
instructed, they were asked to do so. There were 29 CV nonsense pairs from 
the filler set presented first to subjects as practice with the computer appa- 
ratus. 

Results 

Subjects' responses to all 1 080 CV stimulus pairs were transcribed by one 
listener and then checked by another. Across the 13 subjects, there were 1 85 
Disagreements ( 1 . 3% ) » which were resolved by relistening to the disputed pairs 



Table 1 

Feature Differences Separating Consonants in a Pair and Error Frequencies for 
Test Stimuli in Experiment 1 as a Function of Consonant Pair and Vowel Pair 
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until a consensus was -eached. A response was scored as an error if the pair 
deviated in any way fron the stimulus; chus, null responses were scored as er- 
rors. The results for the 120 test stimuli are summarized in Table 1 in terms 
of error frequencies as a function of consonant pair and vowel pai^. 

As is clear frc\n Table 1, the pairs did not have consistent effects 

on error rates. An analysis o fiance was conducted on the error data 
summed across vowel pairs in order to determine the significant effects due to 
consonant pairs. Two factors were included in the analysis, one for the ten 
different combinations of consonants, and the second to assess the effect of 
consonant frequency on error rates, such that the first permutation of the 
consonant pair ned the more frequent of the two consonants preceding the less 
frequent consonant (with frequency determined by Dewey, 1923), and the second 
permutation had the less frequent consonant preceding the more frequent one, 
as revealed in the ordering of Table 1. Both main effects were significant. 
The consonant pairs were significantly different from one another, 
F(9,108)-6.89, £<.0001, and consonant pairs for which the less frequent conso- 
nant preceded the more frequent consonant had a significantly greater number 
of errors, F(1 , 1 2) -5.76 , £-.0335. The interaction of consonant pairs and fre- 
quency was not significant, F(9 , 108 )-1 .50, £-.1560. 

Feature analysis U A further analysis was performed on the same data in 
order to test the hypothesis that the number of feature differences between 
each consonant in a target pair was crucial in determining the error rate. 
Since there are a variety of competing feature analyses and since the choice 
of a single feature sy3tem could bias our results, we chose to contrast two 
phonetic feature systems: the well-known system devised by Chomsky and Halle 
(1968), henceforth C & H, and another one derived from a corpus of speech er- 
rors in English and German by van den Broecke and Goldstein (1980), henceforth 
B & G. First, the consonant pairs were divided into four feature difference 
classes according to C & H (see Table 1), and errors were averaged across con- 
sonant pairs in each class. The main effect of feature difference class was 
not significant, £(3,36 )-1 .09 , £-.3672. Furthermore, the error rate did not 
monotonically increase or decrease with the number of feature differences, and 
the error rate for the consonant pair sh-ch differed greatly from that for 
th-t, though both consonant pairs differ on the same single feature. 

Next, the consonants were divided into three feature difference classes 
according to B & G (see Table 1). With this feature set, the main effect of 
feature difference class was significant, F(2 ,2*0-1 J|. 22, £-.0002. The mean 
number of errors per subject for consonant pairs differing on one feature was 
2.2, on two features, 1.4, and on three features, 1.2. 

Substitution errors . A separate analysis was made of substitution er- 
rors, in which the correct consonant in a syllable of a test stimulus was re- 
placed by another consonant in the stimulus set. The resulting confusion ma- 
trix is presented in Table 2. 

In order to determine whether the relative frequency with which each con- 
sonant segment intrudes is the same as the frequency with which it appears as 
a target, we computed a x statistic comparing the two distributions and found 
that they were in fact significantly different from one another, x (D-69.1, 
p<.01. One striking discrepancy between the previous study by Shat- 
tucK-Hufnagel and Klatt (1980) and ours concerns the asymmetrical pattern of 
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Table ~ 

Substitution Errors in Experiment 1 as a Function of Target Consonant and 
Intrusion Consonant 

Target 



Intrusion 


T 


S 


SH 


CH 


TH 


TOTAL 


T 




10 


6 


5 


27 


18 


S 


6 




51 


6 


10 


76 


SH 


ii 


28 




19 


5 


86 


CH 


6 


6 


26 




18 


5i 


TH 


5 


5 


10 


9 




29 


TOTAL 


21 


19 


96 


69 


60 


295 



ERIC 

hrnnniffTirmmiJ 



substitutions involving sh and s. In the earlier study, there were more re- 
placements of s by sh than vici versa, whereas the opposite was found in che 
present study. This discrepancy may be attributable in part to visual fac- 
tors. Perhaps, consonant segments that contain the same letters (sh/s and 
th/t) are particularly likely to be confused, especially in the direction of 
letter deletion. An analysis that eliminates such confusions, by combining 
the sh and s segments and the th and t segments, yields a marginally signif- 
icant difference between the target and intrusion distributions, x (2) ■ 

£<.10. 

Frequency analysis . To determine whether the incidence of errors for 
each target consonant phoneme is related to the log frequency of that segment 
in English, we computed a Peaison Product-Moment correlation coefficient 
relating the frequency with which each of the five consonants occurred as a 
target to its log frequency in English (Dewey, 1923). As expected according 
to the strength explanation, there was a negative correlation, although it did 
not reach standard levels of statistical significance, r(3)«~.696, £>.10. A 
significant negative correlation was found when the frequency analysis of 
Shattuck-Hufnagel and Klatt (1979) was used instead of that of Dewey (1 923), 
r(3)--.887, £<.05. This new frequency analysis, henceforth the content count, 
was derived from the speech sample of Carterette and Jones (197*0 and includes 
only content words, not function words or common bound morphemes. H 

A similar analysis was conducted to compare intrusion frequency and log 
frequency in the language. The correlations in this case were not significant 
for che Dewey (1923) count, r(3)-.28H f £>.10, nor for the content count, 
r(3) — .05U, £>.10. 

In view of the high correlations for target frequency and despite the low 
correlations for intrusion frequency, frequency in the language in addition to 
visual confusions may be a source of the asymmetry in intrusions noted earli- 
er. In order to test this hypothesis, for the ten consonant pairs (e.g., 
ch-t), we compared how often the more frequent phoneme intruded for the less 
frequent phoneme (t for ch) rather than vice versa (ch for t). For one test 
we used the Dewey count, which yielded a significant difference, t(9)-2.11 , 
£<.05, and for a second test we used the more recent content count, which was 
not significant t(9)<1. By both counts, the more frequent phoneme in the pair 
intruded more often on the average than did the less frequent phoneme, in ac- 
cord with a strength explanation of speech errors. 
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Feature analysis 2. A second feature analysis was performed on the 
substitution data to see whether more substitutions Oi y for x occur when x 
and y differ by a single phonetic feature than when thev differ by more. For 
the C & H features, the mean number of substitution errors involving a change 
of one feature was 20, of two features 2*1, of three features 6, and of four 
features 9. Clearly though one- and two-feature changes are more frequent 
than three- and four-feature changes, there is not c monotonic decrease in the 
number of substitution errors as the number of feature changes increases. In- 
deed, sh-ch and th-t , which differ on the same single feature according to C & 
H, show mean substitution rates of 38 and 16, respectively. Furthermore, 
there are complementary asymmetries in the substitution rates for these two 
pairs (see Table 2) such that the feature change to [♦ continuant] involves 
fewer errors for the pair t-th but more errors for the pair ch-sh. 

For the B & G features, the mean number of substitution errors involving 
a change of one feature was 23, of two features, 8, and of three features, 10. 
Although there is not a perfect monotonic decrease in the number of substitu- 
tion errors as the number of feature changes increases, it is clear that the 
single feature substitution errors are most frequent. 

Availability analysis . A further analysis was performed on the substitu- 
tion errors to assess the role of segment availability. We determined the 
number of times a substitution error of y for x occurred in the environment of 
y (i.e., how often did the intrusion phoneme /t/ occur for the target phoneme 
/s/ when the test consonant pair was t-s or s-t). By comparing that number to 
the overall number of y for x substitutions, we determined the percentage of 
times that a substitution occurred when the error was part of the intended 
utterance (see Table 3). For substitution errors of y for x, y was part of 
the intended utterance 47.5? of the time. Since x was paired with phonemes 
other than y three times as often as it was paired with y, the appropriate 
chance percentage is 25)E. Hence, segment availability in the stimulus does 
seem to influence error rate. However, it clearly is not necessary for the 
intruding phoneme to be part of the intended utterance, since the majority of 
the substitutions of y for x occur when y is not part of the intended utter- 
ance, defined narrowly here as the test CV nonsense syllable pair. 

Furthermore, phoneme frequency seems to influence the importance of avai- 
lability. When the direction of the substitution error involves a change from 
a relatively more frequent (strong or ♦) to a relatively less frequent (weak 
or -) phoneme (see Table 3)» then It is more important that the infrequent 
segment be available, than when the direction of the substitution involves a 
change from a relatively weak to a relatively strong phoneme. Thus, by the 
Dewey count of phoneme frequency, when a change involves strong (♦) to weak 
(-), the weak segment is available 58.1)1 of the time, wnereas when the change 
involves weak (-) to strong (♦), the strong segment is available only HI .6% of 
the time, t(9 )«3* 1 9 » £<.05. The same pattern obtains with the content count 
(53.8* from strong (♦) to weak (-), 42. 3% from weak (-) to strong (♦)), al- 
though the latter set of differences is not significant, M9)<1. 

On the other hand, the availability of the intruding phoneme did not vary 
regularly with the number of feature differences separating each consonant 
pair. By the C & H feature set, the intruding phoneme was available 42. 6J of 
the time when there was a single feature difference between the consonants in 
a pair, 41. 8% of the time when there were two feature differences, 55. 3% of 
the time when there were three feature differences, and 70.3)1 of the time when 
there were four feature differences. Although this pattern suggests the 
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possibility that it is more important that the intruding phoneme be available 
when consonant pairs differ by three or more features, it is not confirmed in 
the pattern of availability for the B & G features. In that case, the intrud- 
ing phoneme was available 46.5)1 of the time when the consonants in a pair dif- 
fered on a single feature, 57.6$ of the time when they differed on two fea- 
tures, but only 35.7$ of the time when the consonants differed on three fea- 
tures. 

Discussion 

The results of Experiment 1 show that the likelihood of an error occur- 
ring for a given segment in a test pair depends in part on the relative fre- 
quency in English of the individual segments in the pair. Thus, the matrix 
generated by the substitution errors showed significant asymmetry. There was 
a high negative correlation between the frequency of an error occurring for a 
target segment and its log frequency of occurrence in English as well as evi- 
dence that a more frequent segment is more likely to intrude for a less fre- 
quent segment than vice versa, 
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Segment similarity clearly influences the generation of speech errors, 
although the pattern of errors and substitutions is more interpretable when 
segment similarity is based on the B & G rather than the C & H feature set. 

Finally, availability of the source segment along with the target segment 
within the intended utterance, although important, does not seem to be a nec- 
essary factor, but its role increases when the intended segment is higher in 
frequency than the one that replaces it. 

In order to assure that the results of Experiment 1, in which the scimuli 
w^e visually presented, were not an artifact of the visual modality, we 
redesigned our materials for auditory presentation in Experiment 2. 

Experiment 2 

Tongue twisters (e.g., "she sells sea .shells") often result from 
conflicting vowel and consonant patterns. For example, there is an ABBA 
(/§/-/s/-/s/-/5/) consonant pattern and an ABAB (/i/-/e/-/i/-/e/) vowel pat- 
tern in the well-known tongue twister cited abovr. Our CV nonsense test syll- 
ables were presented auditorily co subjects in this tongue twister format, 
four syllables at a time, such that the consonant pattern of presentation was 
ABBA and the vowel pattern ABAB, and subjects were asked to repeat the se- 
quence of four syllables as quickly and as accurately as possible. 

Method 

Stimuli . The test consonant phonemes /s, S, 5. t, 0/ and vowels /a, i, 
u/ of Experiment 1 were used in Experiment 2. Each of the possible CV non- 
sense pairs (eliminating all identical consonant and identical vowel possibil- 
ities) was joined with a CV nonsense pair in which the order of the consonants 
changed but the vowels remained the same (e.g., sa tu la su). There was a to- 
tal of 120 such four-syllable stimuli. Each of the original 15 syllables (5 
consonants x 3 vowels) was recorded by one of the investigators (AGL), digi- 
tized at 20 kH and stored on tape. All of the four-syllable nonsense CV sti- 
muli were thus produced from the same original 15 syllables. There were 300 
ms between syllables in a four-syllable string and a 5 s ISI between stimuli. 
There were no distractor or filler sequences. 

Design . The stimuli were presented in pseudorandom order in six blocks 
of 20 each with the following constraints: No consonant occurred on two 
successive trials, each of the 20 consonant pairs occurred once in each block, 
and each vowel pair occurred once with each consonant pair in the test and ei- 
ther 3 (4 pairs) or 4 (2 pairs) times per 20-trial block. 

Subjects . Eighteen men and women from the University of Colorado 
participated in the experiment and received course credit in an introductory 
psychology class. 

Apparatus and procedure . The stimuli were transmitted to the subject 
binaurally through a pair of Telephonies earphones (Model TDH-39). The stimu- 
lus tape was played with a TEAC A-330OS tape recorder at a comfortable listen- 
ing level. The subjects spoke into a Super scope Model EC-1 condenser micro- 
phone that was attached to an optisonics Sound-0-Matic II cassette tape 
recorder. 



43 



ERIC 



50 



Levitt I Healy: The Roles of Phoneme Frequency, Similarity, and Availability 



The subjeccs were told that they would hear a series of four-syllable se- 
quences. They were instructed that the four syllables in each sequence would 
all be composed of a consonant sound followed by a vowel sound and that the 
consonants would always be presented in an ABBA pattern, and the vowels would 
always be in an ABAB pattern. They were given as an example the four-syllable 
sequence ta-sl-sa-tl, which has a t-s-s-t (or ABBA) consonant pattern and an 
aM-a-i (or ABAB) vowel pattern. The subjects were further told that there 
were only five different initial consonants (s as in sigh ; t as in tie ; th as 
in thigh ; sh as in shy ; and ch as in child ) and only three different vowels 
(/a/ as in cot ; /i/ as in eat ; and /u/ as in boot ) . 

The subjects 1 task was to repeat aloud into the microphone each 
four-syllable sequence they heard as quickly as possible without making er- 
rors. They were told to try to say all four syllables and guess if necessary. 
They were instructed not to worry if they made a mistake or had trouble re- 
peating a sequence but to listen carefully for the sequence following the one 
they missed and to try and keep up with the tape. The subjects were then giv- 
en three practice trials spoken by the experimenter (sa-ti-ta-si ; 
chu-tha-thu-cha ; shl -su-sl -shu ) . 

Results 

Subjects' response? to all 120 test stimulus quadruples were transcribed 
by one listener and then checked by another. Across the 18 subjects, there 
were 340 Discrepancies ( 3-9%) , which were resolved by a thiM listener. How- 
ever, since a great number of these disagreeuents involved confusions of iQl 
and It I and since It I was not a possible stimulus, all responses of It I were 
replaced by /0/ (there were 718 It I responses [8.3Z] that were replaced in 
this way). Each syllable was scored separately and was determined to be an 
error if it deviated in any way from the stimulus. The results for the 120 
test stimuli are summarized in Table 4 in terms of error frequencies as a 
function of consonant pair (ABBA) and vowel pair (ABAB). 

As in Experiment 1, the vowel pairs did not have consistent effects on 
error rates. An analysis of varianc-e was conducted on the error data summed 
across vowel pairs to assess the effects due to consonant pairs. The conso- 
nant pairs differed significantly from one another, F (9 , 1 53)-1 * . 1 7 , £<.0001, 
and the quadruples for which the less frequent sound was heard first had sig- 
nificantly more errors, F(1 , 17 )-1 5 .92 , £-.0009. 

Feature analysis JL As for Experiment 1, the consonant pairs were first 
divided into four feature-difference classes by the C & H feature system (see 
Table 4), and errors were averaged across consonant pairs in each class. The 
main effect of feature-difference class was marginally significant, 
F(3 ,51)»2.59 , £-.0632, but the error rate again did not monotonically increase 
or decrease with the number of feature differences. 

Next, the consonant pairs were divided into three feature difference 
classes by the B & G feature system (see Table 4). The main effect of feature 
difference class was significant, F(2 ,34 )«16 .24 , £<.0001. The mean number of 
errors per subject for consonant pairs differing on one feature was 11.2, on 
two features, 9.2, and on three features, 8.1. 
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Table 4 

Feature Differences Separating Consonants in a Pair and Error Frequencies in 
Experiment 2 as a Function of Consonant Pair (ABBA) and Vowel Pair (ABAB) 
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Substitution Errors in Experiment 2 as a Function of Target Consonant and 
Intrusi on Consonant 
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Substitution errors . As in Experiment 1 , a separate analysis was made of 
the substitution errors in which the correct consonant sound was replaced by 
another consonant in the stimulus set (see Table 5). To evaluate the extent 
to which the relative frequency that each consc.iant segment intruded corre- 
sponds to the frequency that it appeared as a target, we computed a x statis- 
tic comparing the two ^ stributions and found * lat they were in fact signif- 
icantly different from each other, x (4)-391.8, £<.01, as in Experiment 1. 
Also in agreement with Experiment 1, but unlike the study by Shattuck-Huf nagel 
and Klatt (1980), we found more replacements of sh by s than vice versa. 

Frequency analysis . As for Experiment 1 , we computed two sets of corre- 
lation coefficients to determine the relation between the log frequency in the 
language of a given consonant segment and its frequency of occurrence as a 
target or intrusion. For targets, the correlations were negative, as expect- 
ed, but nonsignificant for both the Dewey, r(3)--.352, £>.10, and the content 
count, r(3)»-.658, £>.10. For intrusions, the correlations were positive but 
not significant, for Dewey, r(3)-.331, £>.10, and for the content count, 
r(3)«.505, £>.10. To evaluate whether frequency in the language may account 
for the asymmetry in intrusion errors, we compared how often the more frequent 
phoneme in a pair intruded for the less frequent phoneme rather than vice ver- 
sa. The more frequenc phoneme intruded more often on the average for both 
counts. This difference was significant by the content count, t(9)«3.20, 
£<.05, but not by the Dewey count, t(9)<1. 

Feature analysis 2. For the C & H features, the mean number of substitu- 
tion errors involving a change of one feature was 174, of two features 172, of 
three features 157 » and of four features 114. Although substitution errors 
monotonically decreased as feature differences increased, again, sh-ch and 
th-t f which differ on the same single feature according to C & H, shew mean 
substitution rates of 212 and 140, respectively. 

For the B & G features, the mean number of substitution errors involving 
a change of one feature was 180, of two features, 152, and of three features, 
120. Again, there is a monotonic decrease as the number of feature differ- 
ences increases. 

Availability analysis . For substitutions of y for x, y was part of the 
intended utterance 41.6% of the time (see Table 6), a percentage which is 
substantially higher than that expected on the basis of chance alone (25$). 

Phoneme frequency again appears to have an effect on the importance of 
availability. When the direction of substitution goes from a strong (+) to a 
weak (-) segment, the weak segment is available 47. 9$ of the time by the con- 
tent count, and when the direction of substitution goes from a weak segment 
(-) to a strong segment (+), the strong segment is available 37. 3$ of the time 
by the content count, - t(9)-2.93, p<.05. The same pattern obtains with the 
Dewey count (42.1? from strong ( + ) to weak (-), 41 .0% from weak (-) to strong 
(+)), although the latter set of differences was not significant, t(9)<1. 

We found only a slight trend indicating that the intruding phoneme is 
less available when consonant pairs differ by a single feature than when f hev 
differ by more features for either feature set. For C & H , the intruding 
phoneme was available 37.951 of the time when there was a single feature 
difference between ';he consonants in a pair, 45.7$ of the time when there were 
two feature differences, 42.2$ of the time when there were three feature 
differences, and 42.4$ of the time when there were four feature differences. 
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Table 6 



Relative Frequency of Target Phoneme (x) and Intruding Phoneme (y) and 
Percentage of Errors of the Type x Changes to y When y Was Available 
in the Stimulus in Experiment 2 
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♦ 


47 
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35.1 


th 
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60 
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47.6 










TOTAL 


1302 


3132 


41 ,6 



For B & G, the intruding phoneme was available HO .8% of the time when there 
was a single feature difference between the consonants in a pair, 42.3? of the 
time when there were two feature differences and 42. 0% of the time when there 
were three feature differences. 

General Discussion 

Comparison of Experiments 1_ and 2. In Experiment 1 we considered the 
possibility that visual confusions contributed to the error pattern in that 
experiment. Experiment 2 provides an important control, since the stimuli in 
Experiment 2 were presented auditorily. Once we had corrected in Experiment 2 
for the common auditory confusion of /f/ and /0/, we found that the results of 
the two experiments were very similar. In fact, the Pearson Product -Moment 
correlation coefficient comparing the target phoneme frequencies in Experi- 
ments 1 and 2 showed a significant correlation, r(3)=.915, £<.05. When the 
intrusion phoneme frequencies of the two experiments were compared, we found a 
nonsignificant negative correlation, r(3)»-.250, £>.10. Although the exact 
patterns of intrusions for the two experiments did not correspond, the t_ tests 
reported earlier did show an effect of phoneme frequency on intrusions for 
both experiments. The error frequencies for the twenty oonsonant pairs them- 
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selves were also highly correlated in the two experiments. Recall thai the 
test stiinuli in Experiment 1 were CV nun er 3e pairs (e.g., ta si) and in 
Experiment 2 they were CV nonsense quadruples ve. g., ±a si sa ti). Mus er/or 
frequencies for the stimuli in Experiment 1 were compared separately for the 
first two and the second two syllables in Experiment 2. Whin the number of 
errors for each CV pair in Experiment 1 was compared with the number of err^s 
for the first two syllables of the CV nonsense quadruple in Experiment 2, the 
resulting correlation was statistically significant, r(l8)«.772, £<.01. When 
the error frequencies of Experiment 1 were compared with those of the secorid 
two syllables of the CV nonsense quadruples of Experiment 2, the correlation 
was again statistically significant, r(l8)«.539, £<.05. Finally, the error 
frequencies for the first two and the second syllables of the CV nonsense 
quadruples in Experiment 2 were comoared and again the correla' 'on was signif- 
icant, r(l8^-.772, £<.01. Though visual confusions may have he. a small ef- 
fect or. i\tO error pattern of Experiment 1 and auditory confusions (most clear- 
ly tnuse involving /f/ and /0/) did occi:^ in Experiment 2, the patterns of er- 
rors in the *"wo experiments are clearly very similar. These patterns point to 
the importance of phoneme frequency in the error generatior process. 

The role of phoneme frequency . We can see two ways in which phoneme fre- 
quency had an effect on our results. In the first place, when we examined our 
data for errors of any type, * . found in both experiments that consonant pair 
stimuli in which the first consonant was less frequent than the second (e.g., 
ch-t) tended to o^oduce more errors than consonant pair stimuli in which the 
first consonant <*as more frequent (e.g., t-ch). 

In the second place, when we examined substitution errors restricted to 
the test consonant set, we found that phoneme frequency in ErgHsh showed a 
negative correlation with target frequencies. We also found, when we looked 
at the ten consonant combinations, that the more frequen* phoneme of the pair 
was more likely to intrude as an error for the other member than vice versa. 
These findings lend support to an explanation of the error generation process 
in which phoneme strength is determined by phoneme frequency. Thus we i'ind a 
negative correlation between target phoneme frequency and frequency of 
occ urence in English because more frequent or stronger phonemes are less 
likely to function as targets or mispronounced segments. On the other hand, 
more frequent or stronger phonemes are somewhat more likely to function as 
intrusions. 

These effects of frequency emerge in the experimental elicitation of er- 
rors because we were able Co control the prior probabilities of occuri ence of 
the individual phonemes. With equal prior probabilities, we find an asymmetr- 
ical pattern of substitution errors. However, the asymmetrical pattern that 
emerges from our data is diffc *ent from the one found initially by Shat- 
tuck-Hufnagel and Klatt: We find no evidence "or a palatalizing mechanise, 
since we find more non-palatalizing (e.g., sh to s) than palatalizing (e.g., s 
to sh) substitution errors in both experiments. 

There is always the danger in an f-^^r imental situation that some factor 
that does not operate in the spontaneous error generation process wa3 intro- 
duced. We used nonsense syllables aj stimuli rather than English words, in 
order to eliminate effects of lexical frequency and lexi'al bias in the error 
generation process, but nonsense syllables may behave differently than English 
words. For example, in an expar iment designed t^ elicit speech errors, in 
which she nad subjects read or recall tongue twisters, Shattuck-Hufnagel 
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(1 982) found a differential pattern of errors in the i ocall condition when she 
compa/ed eve nonsense syllables CVC English words, CVC nonsense syllables 
en/oedded in short phrases, and CVC English words embedded in short phrases. 
Only the CVC English words embedded in short phrases showed a higher 
percentage of word-initial errors (as is found in naturally occurring speech 
errors) , whereas all the other eliciting sets showed a higher percentage of 
word-final errors. However, this result was obtained largely through a reduc- 
t'on in the number of word-final errors in the CVC English words embedded in 
phrases as compared to the other conditions. Furthermore, since we used only 
CV syllables in our study, and we found very few vowel errors, our errors were 
almost entirely in word-initial position. Thus, we believe that the differ- 
ences between our findings and those of Shattuck-Hufnagel and Klatt (1980) are 
due largely to the differences in prior probabilities of the phoneme targets 
and are not due to factors introduced by our experimental method or use of 
non sen se sy 1 lab le s . 

In our view, phoneme strength is a function of phoneme frequency rather 
than ease of articulation, age of acquisition, or status in phonological theo- 
ry of the phoneme in question. With respect to articulation, in comparing /s/ 
and /§/, Borden and Harris (1980) point out that "a wide range of openings be- 
yond those for /s/ result in /§/ type sounds" (p. 121). So we see that arti- 
culation of the phoneme /s/ i c more precise and therefore presumably more dif- 
ficult (see also Anderson, 1942). In contrast, there are claims in the 
literature (e.g., Lester & Skousen, 1974) that /s/ is acquired earlier than 
/§/. Closer examination of the data reveals that children often produce an 
/s/-like fricative phoneme or stop where the adult model has an /s/ (Ferguson, 
1 978; Mckowitz, 1 973) before they produce a phoneme for words in which the 
adult model has an /§/, probably because of the higher frequency of /s/. How- 
ever, that correct articulation of /s/ is often acquired rather late is clear 
from reports of speech therapists (Anderson, 1942; Berry & Eisenscn, 1947) and 
others (Ingram, Christensen, Veach, & Webster, 1980; Sander, 1972; Velleman, 
1983) who attest to its difficulty. Finally, although in phonological marked- 
ness theory, as outlined by Chomsky and Halle (1968), /s/ is less marked than 
/§/, in a more general test of phonological markedness in the elicitation of 
t >eech errors, Motley and Baars (1975) did not find markedness to be a signif- 
icant factor. 

Frequency in the language is then for us the Lest i/idex of a phoneme's 
strength. We believe that frequent phonemes are "stronger" than infrequent 
ones because they are the more common of highly overleurned motor patterns. 
In this view, we see single segment error3 involving similar segments as exam- 
ples of Norman's (1981) capture errors: "when a sequence being performed is 
imilar to another more frequent or better learned sequence, the latter may 
capture control" (p. 6). The initial gestures relevant to the pronunciation 
of /^/ and /§/ are no doubt very similar, if not identical. It is easy to see 
how the gestures tu produce an /5/ could be "captured" by the more frequent 
/s/ gestures. 

Segment similarity . Do speech error rates or patterns of substitutions 
depend on minimal feature differences between consonant pairs? The answer to 
such a question depends on the feature system one chooses. Ideally, one would 
like to find that a system motivated on independent grounds, such as the one 
devised by Lnonsky and Halle 0968), also captures in a principled way the 
structural relationships in speech errors. Indeed, van :n Broecke and Gold- 
stein (1980) compared a number of feature systems, along with the one they de- 
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vised on the basis of English and German speech errors, and found that 
"feature systems designed wit out incorporating evidence from speech errors 
are all capable of showing meaningful structure in phonological speech errors 
as they occur" (p. 63). Nonetheless, segment similarity emerges as a signif- 
icant effect in our data only when we use the B & G features to determine seg- 
ment similarity. That the segment similarity effects in our data are best 
demonstrated by the B & G features, derived from the analysis of naturally 
occurring speech errors in English and German, suggests that the errors we 
find in our experimental situation are analogous to those occurring in collec- 
tions of naturally occurring utterances. 

Availability , When naturally occurring speech errors are analyzed, the 
assumption is often made that errors are most likely to occur when similar 
segments are simultaneously available. Yet the results of our experiments 
suggest that availability, here defined in narrow terms as a substitution of x 
for y when x is part of the stimulus, is important but not necessary, since 
the percentage of the x for y substitution? in both experiments that occur 
when x is part of the stimulus is substantially greater than the chance value 
but no greater than 50%. Indeed, the substitution errors in the corpus exam- 
ined by Shattuck-Hufnagel and Klatt (1979) include 30% with no known source 
■-^d* '.t is possible that the actual proportion of naturally occurring speech 
errors that have no source in the surrounding context might be higher than 
th2* estimated by Shattuc'<-Hufnagel and latt, and it might be wrong to assume 
in such cases that the intruding t ror was part of the intended utterance (see 
Harley, 1984, for a discussion of higher level non-plan-int<*rnal errors). Fi- 
nally, we find that srjment availability becomes increasingly important as the 
frequency of the intruded phoneme decreases and perhaps, to a lesser extent, 
as the featural similarity between the intruded and target phonemes decreases. 

However, *t is difficult to compare the relative magnitudes of the ef- 
fects of phoneme frequency and availability (see Sechrest & Yeaton, 1982). 
Moreover, the influence of phoneme frequency on the importance of availability 
suggests that both effects ma, stem from the same activation mechanism. The 
frequency effect may be reflecting differences in the base activation levels 
of phonemes, whereas the availability effect may reflect transient increases 
in phoneme activation that result from being part of the intended utterance. 5 

Conclusions . The results of our two experiments provide support for an 
evplanation of the speech error generation process in which a segment's 
strength is a function of its frequency of occurrence in English: Weak (or 
infrequent) segments tend to serve as targets whereas strong (or frequent) 
segments tend to serve as intrusions. The role of phoneme frequency is a con- 
sistently important one. Phoneme availabi 1 ity also plays a role, though per- 
haps more restricted than expected. Furthermore, availability may be reflect- 
ing the same activation mechanism responsible I'or the frequency effect. Fi- 
nally, the notion that the segments the interact in speech errors are likely 
to be similar is best supported by our data when segment similarity is defined 
in terms of a feature set derived from naturally occurring speech errors. 
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Footnotes 

'Motley and Baars (1975) found experimental evidence that consonant fre- 
quency in initial position affects the tendency of initial consonants <n pairs 
of CVC nonsense words to interchange. Hence, frequency in the language seems 
like an appropriate initial index of phoneme strength. 

Although none of the CV nonsense pairs represented common lexical items 
as visually presented, six of them did represent common lexical items as pro- 
nounced: sl_ » 'see'; shi » ? she ? ; ti - 'tea 1 ; su - 'sue 1 ; shu * 'shoe'; tu = 
'two.' 

3 It is possible that this rapid reading procedure is influenced by 
articulatory interference of the type involved in tongue twisters as well as 
by the factors producing higher-level slips of the tongua. However, Cohen 
(1 973) found that the pattern of speech errors induced via a rapid reading 
procedure was of a very similar nature to that of a naturally collected 
corpus. 

"•The rank order of the consonant phonemes by the Dewey (1923) count is 
t>s>sh>ch>th, wherea* by the content count it is t>s>th>ch>sh. 

5 We are indebted to Marcel Just for making this point. 
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ON LEARNING TO SPEAK* 



Michael Studdert-Kennedy t 



Abstract , Every language, spoken or signed, deploys a large lexi- 
con, made possible by permutation and combination of a small set of 
linguistic elements. In speech, rapid interleaving of the gestures 
that form these elements (consonants and vowels) leads to a complex 
acoustic signal in which the boundaries between elements are lost. 
However, for the child learning to speak, the initial task is not to 
recover these elements, but simply to imitate the sound pattern that 
it hears. Studies of "lipreading" in adults and infants suggest 
that imitation is mediated by an amodal representation, closely 
related to the dynamics of articulation, and that a left-hemisphere 
perceptuomcur mechanism specialize to make use of this reoresenta- 
tion develops during the first six months of life. By drawing on 
this specialized mechanism, the infant learns the recurrent patterns 
of acoustic structure and articulatory gestures from which linguis- 
tic segments must be presumed to emerge. 

As a system of animal communication, language has the distinctive proper- 
ty of being open, that is, fitted to carrying messages on an unlimited range 
of topics. Human cognitive capacity is, of course, greater than that of other 
animals, but this may be a consequence as much as a cause of linguistic range. 
Other primate communication systems have a limited referential scope — sources 
of food or danger, personal and group identity, sexual inclination emotional 
state, and so on — and a limited set of no more than 10-40 signals (Wilson, 
1975, p. 183). In fact, 10-40 holistically distinct signals may be close to 
the upper range of primate perceptual and motor capacity. The distinctive 
property of language is that it has finessed that upper limit, by developing a 
double structure, or dual pattern (Hockett, 1958). 

The two levels of patterning are phonology and syntax. The first permits 
us to develop a large lexicon, the second permits us to deploy the lexicon in 
predicating relations among objects and events. My present concern is entire- 
ly with the first level. A six-year-old middle-cljss American child already 
recognizes some 13,000 words (Templin, 1957), while an adult's recognition vo- 
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cabulary may be well over 100,000. Every language, however primitive the 
culture of its speakers by Western standards, deploys a large lexicon. This 
is possible because the phonology, or sound pattern, of a language draws on a 
small set (roughly between 20 and 100 elements) of meaningless units — conso- 
nants and vowels — to construct a very large set of meaningful units, words (or 
morphemes). These meaningless units may themselves be described in tern3 of a 
smaller set of recurrent, contrasting phonetic properties or features. 
Evidently, there emerged in our hominid ancestors a combinatorial principle 
(later, perhaps, extended into syntax) by which a finite set of articulatory 
gestures cc aid be repeatedly permuted to produce a very large number of dis- 
tinctively different patterns. 

"Articulatory gesture" refers, at a gross level, to opening and closing 
the mouth. Repeated constriction of the vocal tract, somewhere between lips 
and glottis to form consonants, and repeated opening of the tract by lowering 
the jaw to form vowels, give rise to the basic consonant vowel syllable from 
which the sound patterns of all spoken languages are formed. The varying 
phonetic qualities of consonants and vowels are determined by the precise 
shape of the vocal tract through which sound — the buzz of vocal fold vibration 
or the hiss of air blown through a narrow constriction — is filtered. The 
shape of the resonating cavities of the vocal tract is determined by fine 
positioning of the articulators: raising, lowering, fronting or backing the 
tip, blade or body of the tongue, raising or lowering the velum, rounding or 
spreading the lips, and so on. 

Thus, permutation and combination of some two dozen gestures provide 
"...a kind of impedance match between an open-ended set of meaningful symbols 
and a decidedly limited set of signaling devices" (Studdert-Kennedy & Lane, 
1980, p. 35). Yet permutation and combination alone would not suffice for a 
flexible and open-ended system of communication, if the gestures were not 
executed rapidly enough to evade the limits of short-term memory and to match 
the natural rate of thought and action. 

What this "natural rate" may be we do not know. But for English, at 
least, a typical rate of speech is of the order of 150 words/min. This 
reduces to roughly 10 to 15 phonemes (consonants and vowels)/s. As Cooper has 
remarked, such rates can be achieved "...only if separate parts of the articu- 
latory machinery — muscles of the lips, tongue, velum, etc. — can be separately 
controlled, and if... a change of state for any one of these articulatory 
entities, taken together with the current scale of others, is a change 
to... another phoneme.... It is this kind of parallel processing that makes it 
possible to ge^ high-speed performance with low-speed machinery" (Liberman, 
Cooper, Shankweiler, & Studdert-Kennedy, 1967, p. 446). Thus, repeated use of 
a srmll set of interleaved gestures may not only expand the potential lexicon, 
but also ensure rapid execution of its elements. 

Let me conclude this brief introduction by noting that the dual motoric 
structure of spoken language has no known parallel in any other system of ari- 
mal behavior, except manual-facial 3ign languages. Over the past 15-20 years 
we have learned that American Sign Language (ASL), the first language of over 
100,000 deaf persons, and th~ fourth most common language in the United States 
(Mayberry, 1978), is a fully independent language with its own characteristic 
formational ("phonological") structure and syntax (Klima & Bellugi, 1979). 
Whether signed language is a mere analog of spoken language or a true homolog ; 
drawing on the same neural structures, we do not yet know — although studies of 
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sign language deficits following left hemisphere lesion reveal remarkable par- 
allels with aphasic deficits of spoken language users (e.g., Kimura, Battison, 
& Lubert, 1976; Poizner, Bellugi, & Iragui, in press). 

In any event, my point here is simply that each ASL sign is formed by 
combining four intrinsically meaningless components: a hand configuration, a 
palm orientation, a place in the body space where it is formed, and a move- 
ment. These four classes of component, like the two segmental clashes of spo- 
ken language (consonants and vowels), may also be described in terms of a 
smaller set of recurrent, contrasting features (e.g., Klima & Bellugi, 1979, 
Chapter 7). There are some fifty values, or "primes," distributed across the 
four dimensions and their combination in a sign follows "phonological rules," 
analogous to those that constrain the structure of a syllable in spoken 
languages. In short, both spoken and signed languages exploit combinatorial 
principles of lexical formation. Moreover, it would seem that ohort-term mem- 
ory and cognitive capacity have constrained signed and spoken languages to 
similar rates of communication. For, although each ASL sign takes roughly 
three times as long to form as an English word, the proposition rates in the 
two languages are almost identical (Klima & Bellugi, 1979). This is possible 
because, while the phonological and syntactic structures of a spoken language 
are largely implemented by sequential organization over time, a signed lan- 
guage can exploit simultaneous manual and facial gestures distributed in 
space. Thus, both types of languages are grounded in a capacity for rapid, 
precise, and precisely coordinated movements of a small set of articulators. 

In what follows, I shall have litcle further to say about signed 
languages. Here, I simply note two points. First, we do not talk with our 
toes, and we may doubt whether any imaginable system of human articulators, 
other than those of the hand and mouth, would be capable cf the motor speed 
and precision necessary to implement language, as we know it. Second, 
whatever the evolutionary sequence may have been, the well-established (albeit 
imperfect) correlation between hemispheric specializations for language and 
manual praxis is, I assume, not mere coincidence, in all likelihood, the two 
modes of language draw on closely related neural structures. 

I have dwelt so far on motor requirements. But there are perceptual de- 
mands also. If spoken language is indeed constructed from rapid sequences of 
consonants and vowels, the listener must somehow extract these recurrent ele- 
ments from the signal. Yet, from the earliest spectrographic studies (Joos, 
1948) it has been known that the acoustic flow of speech cannot be readily di- 
vided into a. alphabetic sequence of invariant segments corresponding to the 
invariant segments of linguistic description. The reason for this is simply 
that we do not speak segment by segment, or even syllable by syllable. At any 
instant, the several articulators are executing a complex, interleaved pattern 
of movements, of which the spatio-temporal coordinates reflect the influence 
of several neighboring segments (The reader may test this by slowly utter- 
ing, for example, the words call and keel . The reader will find that the 
position of the tongue on the palate during closure for the first consonant, 
/k/, i3 slightly farther back for the first word than for the second.) The 
consequence of this imbricated pattern of movement is, of course, an imbricat- 
ed pattern of sound, such that any particular acoustic segment typically spec- 
ifies more than one linguistic segment, while any particular linguistic seg- 
ment is specified by more than one acoustic segment (Fant, 1962; Liberman et 
al., 19^7). This lack of isomorphism between acoustic and linguistic struc- 
ture is the central unsolved problem of speech perception. Its continued 
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recalcitrance is reflected in the fact that we are little closer to automatic 
phonetic transcription of speech now than we were thirty years ago (Levinsop & 
Liberman, 1981). 

Many different approaches to the problem have been proposed, but I will 
not review them here (see Studdert-Kennedy, 1980, for fuller discussion). 
Instead, I will attempt to recast the problem by setting aside, for the mo- 
ment, the discrepancy between acoustic signal and linguistic description, and 
simply asking what we know about how a child learns to speak. I shall assume 
that, whatever the process, it is sufficiently general to permit the deaf 
child to learn to sign with as much ease as a hearing child learns to speak. 
I note, further, that when a child learns to sign or speak, it learns a 
specific dialect. That is to say, it gradually discovers, in the detailed 
acoustic or optic patterns of its caretakers 1 signals, specifications for a no 
less detailed pattern of motor organization. 

Stated in this way, the problem becomes a special case of the general 
problem of imitation. Relatively few species imitate. The higher primates 
imitate general bodily actions, but vocal imitation is peculiar to a few 
species of songbirds, certain marine mammals, and humans. The capacity to 
imitate is evidently a rare, specialized capacity for discovering links be- 
tween perceived movements and their corresponding motor controls. 

We may gain insight into the bases of speech imitation from recent stud- 
ies of "lip -ruling" in adults and infants. That adults can learn to lip-read 
is, of course, a commonplace of aural rehabilitation, but the theoretical 
implications of this capacity have only recently begun to emerge. McGurk and 
MacDonald (1976) demonstrated that listeners' perceptions of a spoken syllable 
often change, if they simultaneously watch a video display of a speaker 
pronouncing a different syllable. For example, if listeners are presented 
with the acoustic syllable [ba] repeated four times, while watching a syn- 
chronized optic display of a speaker articulating [ba, va f 3a f da], they will 
typically report the latter, optically specified sequence. That the effect is 
not simply a matter of visual dominance in a sensory hierarchy (Marks, 1978) 
is evidenced by the fact that certain combinations (e.g., acoustic [ba] with 
optic [ga] may be perceived as clusters ([bga] or [gba]), or even as syllables 
corresponding to neither display ([da]). Thus listeners 1 percepts seem to 
arise from a process by which two distinct sources of information, acoustic 
and optic, are actively combined at an abstract level where each has already 
lost its distinctive sensory quality. (For fuller discussion, see Summer- 
field, 1979). 

Further evidence for a amodal representation of speech comes from a 
cross-modal study of the so-called suffix effect by Campbell and Dodd (1980). 
A standard finding of short-term memory studies is that listeners, recalling a 
list of auditorily presented words, recall those at the end better than those 
in the middle (recency effect). The effect is reduced if the list is present- 
ed graphically. Moreover, Crowder and Morton (1969) demonstrated that the ef- 
fect could be abolished, or significantly reduced, if a spoken word was 
appended to the list, not for recall but simply as a signal to begin recall 
(suffix effect). Presumably, the suffix "interferes" in some way with the 
representation of recent items. That this representation is at some relative- 
ly "low," yet structured, level is argued by the facts that the effect (1) is 
unaffected by degree of semantic similarity between suffix and list, (2) is 
reduced if suffix and list are presented to opposite ears, (3) does not occur 
if the suffix is a tone or burst of noise. 
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Campbell and Dodd (1980) used this paradigm to test listeners 1 recall of 
digits, either lip-read (without sound) or presented graphically, with and 
without the spoken suffix, "ten" (heard, but not seen). They found signif- 
icant recency and suffix effects for the lip-read, but not for the graphic, 
lists. In a complementary study, Spoehr and Corin (1978) demonstrated that a 
lip-read suffix reduced recall of auditorily presented lists. Evidently, 
speech heard , but not seen , and speech seen , but not heard , share a common 
representation. Moreover, the fact that Campbell and Dodd did not find a suf- 
fix effect for graphically presented lists suggests that this shared represen- 
tation is not at some abstract, phonological level where spoken and written 
language converge. Rather, these studies, like that of McGurk and MaoDonald 
( 1 976 ) , hint at a representation in some form common to both the light 
reflected and the sound radiated from mouth and lips. 

Consider, now, that infants are also sensitive to structural correspond- 
ences between the acoustic and optic specifications of an event. Spelke 
(1976) showed that four-month-old infants preferred to watch the film (of a 
woman playing "peekaboo," or of a hand rhythmically striking a wood block and 
a tambourine with a baton) that matched the sound track they were hearing. 
Dodd (1979) showed that four-month-old infants watched the face of a woman 
reading nursery rhymes raore attentively when her voice '/as synchronized with 
her facial movements than when it was delayed by '400 ms. If these preferences 
were merely for synchrony, we might expect intants to be satisfied ith any 
acoustic-op tic pattern in which moments of abrupt change are arbitrarily syn- 
chronized. Thus, Jn speech they might be no less attentive to an articulating 
face whose closed mouth was synchronized with syllable amplitude peaks and 
open mouth with amplitude troughs than to the (natural) reverse. However, 
Kuhl and Meltzoff (1982) showed that four- to five-month-old infants looked 
longer at the face of a woman articulating the vowel they were hearing (either 
[i] or [a]) than at the same face articulating the other vowel in synchrony . 
Moreover, the preference disappeared when the signals were pure tones, matched 
in amplitude and duration to the vowels, so tnat the infant preference was 
evidently for a match between a mouth shape and a particular spectral struc- 
ture. Similarly, MacKain et al. (1983) showed that five- to six-month-old 
infants preferred to look at the t'ace of a wl an repeating the disyllable they 
were hearing (e.g., [zuzi]) than at the synchronized face of the sartr woman 
repeating another disyllable (e.g., [vava]). 

In both these studies, the infants 1 preferences were for natural 
structural correspondences between acoustic and optic information. Both stud- 
ies hint at infant sensitivity to intermodal correspondences that could play a 
role in learning to speak. However, I am not suggesting that optic informa- 
tion is necessary, since the blind infant also learns to speak. 1 My intent 
rather is to gain leverage on the puzzle of imitation. What w rt need therefore 
is to establish that the underlying metric of auditory-visual correspondence 
is related to that of the auditory-motor correspondence required fr an 
individual to imitate the utterances of another. 
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To this end we may note, first, the visual-motor link evidenced in the 
capacity to imitate facial expression and, second, the association across many 
primate species between facial expression and pattern of vocalization (Hooff, 
1976; Marler, 1975; Ohala, 1983). Recently, Field et al. (1982) reported that 
36-hour-old infants could imitate the "happy, sad and surprised" expressions 
of a model. However, these are relatively stereotyped emotional responses 
that might be evoked without recourse to t^ie visual-motor link required for 
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imitation of novel movements. More striking is the work of Meltzoff and Moore 
(1977) who showed that 12- to 21 -day-old infants could imitate both arbitrary 
mouth movements, such as tongue protrusion and mouth opening, and (of particu- 
lar interest for the acquisition of ASL) "rbitrary hand movements, such as 
opening and closing the hand by serially moving the fingers. Here mouth open- 
ing was elicited without vocalization; but had vocalization occurred, its 
structure would, of course, have deflected the shape of the mouth. Kuhl and 
Meltzoff (1982) do, in fact, report as an incidental finding of their study 
that 10 of their 32 four- to 5 -month-old infants "...produced sounds that re- 
sembled the adult female's vowels. They seemed to be imitating the female 
talke^, 'taking turns' by alternating their vocalizations with hers" 
(p. 1140). If we accept the evidence that the infants of this study were 
recognizing acoustic-optic correspondences, and add to it the results of the 
adult lipreading studies, calling for a metric in which acoustic and optic 
information are combined, then we may conclude that the perceptual structure 
controlling the infants' imitations was specified in this common metric. 

Evidently, the desired metric must be "...closely related to that of 
articulatory dynamics" (Summerf ield , 1979, p. 329). Following Runeson and 
Frykholm (1981) (see also Summerf ield, 1980), we may suppose that in the visu- 
al perception of an event we perceive not simply the surface kinematics (dis- 
placement, velocity, acceleration), but also the underlying biophysical prop- 
erties that define the structure being moved and the forces that move it 
(mass, force, momentum, elasticity, and so on). Similarly, in perceiving 
speech, we perceive not only its "kinematics," that is, the changes and rates 
of change in spectral structure, but also the underlying dynamic forces that 
produce these changes. In other words, to perceive speech is to perceive 
movements of the articulators, specified by a pattern of radiated sound, Just 
as we perceive movements of the hand, specified by a pattern of reflected 
light. 

The close link, for the infant, between perceiving speech and producing 
it, is further suggested by a curious aspect of the study by MacKain et 
al. (1983), cited earlier. This is the fact that infants' preferences for a 
match between the facial movements they were watching and the speech sound j 
they were hearing was statistically significant only when they we; e looking to 
their right sides. Fourteen of the eighteen infants in the study preferred 
more matches on their right sides than on their left. Moreover, in a fol- 
low-up investigation of familial handedness, MacKain and her colleagues have 
learned that six of the infants have left-handed first or second order rela- 
tives. Of these six, four are the infants who displayed more left-side than 
right-side matches. 

These results can be interpreted in the light of studies by Kinsbourne 
and his colleagues. Kinsbourne (1972) found that right-handed adults tended 
to shift their gaze to the right while solving verbal problems, to the left 
while visualizing spatial relations; left-handers tended to shift gaze in the 
same direction for both types of task, with each direction roughly equally 
represented across the subject group. Lempert and Kinsbourne (1982) showed 
that the effect was reversible for right-handed subjects on a verbal task: 
subjects who rehearsed sentences with head and eyes turned right recalled the 
sentences better than subjects who rehearsed while turned left. Thus, atten- 
tion to one side of the body may facilitate processes for which the contralat- 
eral hemisphere is specialized. 
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Extending this interpretation to the infants of MacKain et al. (1983), we 
may infer that infants with a preference for matches on the right side, rather 
than the left, were revealing a left hemisphere capacity for recognizing 
acoustic- jp tic correspondence in speech. If, further, the metric specifying 
these correspondences is the same as that specifying the auditory-motor corre- 
spondences necessary for imitation (as was argued above), we may conclude that 
five- to six-month-old infants already display a speech perceptuo-motor link 
in the left hemisphere. 

How early this link may develop we do not yet know. However, Best et 
al. (1982), testing, two-, three-, and four-month-old infants dichotically , in 
a cardiac habituation paradigm, found a right-ear advantage for speech and a 
left-ear advantage for music in the three- and four-month olds, but only a 
left-ear advantage for music in the two-month olds. We may suspect, then, 
that the perceptual component of the speech link begins to develop between Ihe 
second and third months of life. By five to six months, close to the onset 0/ 
babbling, the motor component is beginning to emerge. By the end of the first 
year, as babbling fades, the infant would be equipped with the perceptuo-motor 
mechanisms necessary for imitating the sounds of the language it is going to 
learn. 

In conclusion, let me recall the paradoxical discrepancy between the 
speech signal and its linguistic description with which I began. The approach 
to imitation I have sketched deliberately sidesteps this problem. Yet it may 
ultimately contribute to its solution by focusing on the infant for whom the 
discrepancy does not yet exist, for the simple reason that the infant has not 
yet learned the phoretic categories of its language. Tracing the process by 
which the recurrent patterns of infant articulation coalesce into categorical 
linguistic units, evidenced by spoonerisms and other adult speech errors 
(Shattuck-Hufnagel, *979) is a task for the future. However, the task may be 
easier, if we see it as a problem in the development of a unique mode of motor 
control, characteristic of human language. 
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Footnote 

l I have often heard it said that blind children develop language more 
slowly than their sighted peers, but I know of no systematic study on the top- 
ic. 
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Abstract . A motor theory of speech perception, initially proposed 
to account for results of early experiments wi :h synthetic speech, 
is now extensiv- iy revised to accommodate recent findings, and to 
relate the assumptions of the theory to those that might be made 
ab Mjt other perceptual modes. According to the revised theory, 
phonetic Information is perceived in a biologically distinct system, 
a "module" specialized to detect the intended gestures of the speak- 
er that are the basis for phwi.etic categories. Built into the 
structure of this module is the unique but lawful relationship be- 
tween the gestures and the acoustic patterns in which they are vari- 
ously overlapped. In consequence, the module causes perception of 
phonetic c^ructure without translation from preliminary auditory 
impressions. Thus, it is comparable to such other modules as the 
one that enal les an animal to localize sound. Peculiar to tie 
phonetic module are the relation between perception and production 
it incorporates a n d the fact that it must compete with other modules 
for the same stimulus variations. 

Togethei with some of our colleagues, we have long been identified th a 
view of speeuh perception that is often referred to as a "motor theory." Not 
the motor theory, to be sure, because there are other theories of perception 
that, like ours, assign an important role to movement or its sources. But the 
theory we are going to describe is only about speech pei option, in contrast 
to some that deal with other perceptual processes (e.g., Berkeley, 1709; p --t- 
inger, Burnham, Ono, & Bamber, 196?) or, indeed, with all of therr (e.g., 
Washburn, 1926; Watson, 1919). Moreover, our theory is motivated by 
considerations that do not necessarily apply outs ide the domain of speech. 
Yet even there we are not lone, for several theories of speech perception, 
being more or less "motor," re^emb 1 ^ ours to varying degrees (e.g., Chisto- 
vi^h, 1960; Dudley, 1 9U0; Joos, 1948; Ladefoged & McKinney, 1963; Stetson, 
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1951). However , it is not relevant to our purposes to compare these, so, for 
convenience, we wili. refer tc our moto^ theory aa the motor theory. 

We were led to the motor theory by an early finding that the acoustic 
patterns of synthetic speech had to be modified if an invariant phonetic per- 
cept was to be produced across different contexts (Cooper, Delattre, Liberman, 
Borst, & Gerstman, 1952; Liberman, Delattre, & Cooper, i952). Thus, it ap- 
peared that the objects of speech perception were not to be found at the 
acoustic surface. They uiight, however, be sought in the underlying motor pro- 
cesses, if it coula be assumed that the acoustic variability required for an 
invariant percept resulted from the temporal overlap, in different contexts, 
of correspondingly invariant units of production. In its most general form, 
this aspect of the early theory survives, but there have been important revi- 
sions, including especially the one thai makes perception of the motor invari- 
ant depend on a specialized phonetic mode (Liberman, 1932; Liberman, Cooper, 
Shankweiler, & Studdert-Kennedy , 1967? Liberman & Studdert-Kennedy , 1978; Mat- 
tingly & Liberman, 1969). Our aim in this paper is to present further revi- 
sions, and sc brir*g the theory up to date. 



The first claim of the motor theory, as revised, is that the objects of 
speech perception are the intended phonetic gestures of the speaker, 
represented in the brain as invariant motor commands that call for movements 
of the articulators through certain linguistically significant configurations. 
These gestural commands are the physical reality underlying the traditional 
phonetic notions — for example, "tongue backing," "lip rounding," and "jaw 
rai°in~" — that provide the basis for phonetic categories. They are the ele- 
men ary events of speech production and perception. Phonetic segments are 
simply groups of one or more of these elementary events; thus [b] consists of 
a labial stop gesture and [m] of that same gesture combined with a ve- 
lum-lowering gesture. Phonologically , of course, the gestures themselves must 
be viewed as groups of features, such as "labial," "stop," "nasal," but these 
features are attributes of the gestural events, not events as such. To per- 
ceive an utterance, then, is to perceive a specific pat^rn of intended ges- 



We have to say "intended gestures," because, for a number of reasons 
(coa^ticulation being merely the most obvious), the gestures are not directly 
manifested in the acoustic signal or in the observable articulatory movements. 
It is thus no simple matter* (as we shall see in a lat^r section) to define 
specific gestures rigorously or to relate them to their observable conse- 
quences. Yet, clearly, invariant gectures of some description there must ^e 9 
for they are required, not merely for our particular theory of speech percep- 
tion, but for any adequate tfteor> of spep^h production. 

The second claim of the theory is a corollary of the first: if speech 
perception and speech production share the same set of invariants, they must 
be intimately linked. This link, we argue, is not a learned association, a 
result of the fact that what peoplr hear when they listen to speech is what 
they do when they speak. Rather, che link i3 innately specified, requiring 
only epigenetic development to bring it into play. On this claim, perception 
of the gestures occurs in a cpecialized mode, different in important ways from 
the auditory mode, responsible aJso for the production of phonetic structures, 
and part of the larger specialization for language. The adaptive function of 
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the perceptual side of this mode, the side with which the motor theory is 
directly concerned, is to make the conversion from acoustic signal to gesture 
automatically, and so to let listeners perceive phonetic structures without 
mediation by (or translation from) the auditory appearances that the sounds 
might, on purely psychoacoustic grounds, be expected to have, 

A critic might note that the gestures do produce acoustic signals, after 
all, and that surely it is these signals, not the gestures, which stimulate 
the listener's ear. What can it mean, then, to say it is the gestures, not 
the signals, that are perceived? Our critic might also be concerned that the 
theory seems at first blush to assign so special a place to speech as to make 
it hard to think about in normal biological terms. We should, therefore, try 
to forestall misunderstanding by showing that, wrong though it may be, the 
theory is neither logically meaningless nor biologically unthinkable. 

An Issue That Any Theory of Speech Pe rception Must Meet . The motor theo- 
ry would be meaningless if there were, as is sometimes supposed, a one-to-one 
relation between acoustic patterns and gestures, for in that circumstance it 
would matter little whether the listener was said to perceive the one or the 
other. Metaphysical considerations aside, the proximal acoustic patterns 
might as well be the perceived distal objects. But the relation between ges- 
ture and signal is not straightforward. The reason is that the timing of the 
articulat ' movements — the peripheral realizations of the gestures — is not 
simplv related to the ordering of the gestures that is implied by the strings 
of symbols in phonetic transcriptions: the movements for gestures implied by 
a single symbol are typically not simultaneous, and the movements implied by 
successive symbols often overlap extensively. This coarticulation means that 
the changing shape of the vocal tract, and hence the resulting signal, is In- 
fluenced by several gestures at the same time. Thus, the relation between 
gesture and signal, though certainly systematic, is systematic in a way that 
is peculiar tc speech. In later sections of the paper we will consider how 
this circumstance bears on the perception of speech and its theoretical 
interpretation. For now, however, we wish only to justify consideration of 
the motor theory by identifying it as one of several choices that the complex 
relation between gesture and signal faces ua with. For this purpose, we will 
describe just one aspect of the relation, that we may then use it as an exam- 
ple. 

When coarticulation causes the signal to be influenced simultaneously by 
several gestures, a particular gesture will necessarily be represented by dif- 
ferent sounds in different phonetic contexts. In a consonant-vowel syllable, 
for example, the acoustic pattern that contains information about the place of 
constr ction of the consonantal gesture will vary depending on the following 
vowel. Such context-conditioned variation is most apparent, perhaps, in the 
transitions of the formants as the constriction is releaser!. Thus, place 
information for a given consonant is carried by a rising transition in one 
vowel context and a falling transition in another (Liberman, Delattre, Cooper, 
4 Gerstman, 195*0. In isolation, these transitions sound like two different 
glissandi or chirps, which is just what everything we know about auditory 
perception leads us to expect (Mattingly, Liberman, Syrdal, & Halwes, 1971); 
they do not sound alike, and, just as important, neithrr sounds like speech. 
How is it, then, that, in context, they nevertheless yield the same consonant? 
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Auditory theories and tne accounts they provide . The guiding 
assumption of one class of theories is that ordinary auditory processes are 
sufficient to explain the perception of speech; there is no need to invoke a 
further specialization for language, certainly not one that gives the listener 
access to gestures. The several members of this class differ in principle, 
though they are often combined in practice. 

One member of the class counts two stages in the perceptual process: a 
first stage in which, according to principles that apply to the way we hear 
all sounds, the auditory appearances of the acoustic patterns are registered, 
followed by a second stage in which, by an act of sorting or matching to 
prototypes, phonetic labels are affixed (Crowder & Morton, 1969; Fujisaki & 
Kawashima, 1970; Oden & Massaro, 1978; Pisoni, 1973). Just why such different 
acoustic patterns as the rising and falling transitions of our example deserve 
the same label is not explicitly rationalized, it being accounted, presumably, 
a characteristic of the language that the processes of sorting or matching are 
able to manage. Nor does the theory deal with the fact that, in appropriate 
contexts, these transitions support phonetic percepts but do not also produce 
such auditory phenomena as chirps. To the contrary, indeed, it is sometimes 
made explicit that the auditory stage is actually available for use in 
discrimination. Such availability is not always apparent benause the casual 
(or forgetful) listener is assumed to rely on the categorical labels, which 
persist in memory, rather than on the context-sensitive auditory impressions, 
which do no'c; but training or the use of more sensitive psychophysical methods 
is said to give better access to the auditory stage and thus to the stimulus 
variations — including, presumably, the differences in formant transition — that 
the labels ignore (Carney, Wi'lin, & Viemeister, 197 7 ; Pisoni & Tash, 197^; 
Samuel, 1977). 

Another member of the class of auditory theories avoids the problem of 
context-conditioned variation by denying its importance. According to this 
theory, speech perception relies on there being at least a brief period during 
each speech sound when its short-time spectrum is reliably distinct from those 
of other speech sounds. For an initial stop in a stressed syllable, for exam- 
ple, this period includes the bu'st and the first 10 ms after the onset of 
voicing (Stevens & Blumstein, 1978). That a listener is nevertheless able to 
identify speech sounds from which these invariant attributes have been removed 
is explained by the claim that, in natural speech, they are sometimes missing 
or distorted, so that the child must learn to make use of secondary, con- 
text-conditioned attributes, such as formant transitions, which ordinarily 
co-occur with the primary, invariant attributes (Cole & Scott, 197*0. Thus, 
presumably, the different-sounding chirps develop in perception to become the 
tame-eounding (non-chirpy) phonetic element with which they have been 
assoc iated. 

The remaining member of this class of theories is the most thoroughly au- 
ditory of all. By its terms, the very processes of phonetic classification 
depend directly on properties of the auditory system, properties so indepen- 
dent of language as to be found, perhaps, in all mammals (Kuhl, 1981; MilJer, 
1977; Stevens, 1975). As described most commonly in the Mterature, this ver- 
sion of the auditory theory takes the perceived boundary between one phonetic 
category and another to correspond to a naturally-occurring discontinuity 1j 
perception of the relevant acoustic continuum. There is thus no first stage 
in which the (often) different auditory appearances are available, nor is 
there a process of learned equ ivalence. An example is the claim that the 
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distinction between voiced and voiceless stops — normally cued by a complex of 
acoustic differences caused by differences in the phonetic variable known as 
voice-onset-time — depends on an auditory discontinuity in sensitivity to tem- 
poral relations among components of the signal (Kuhl & Miller, 1975; Pisoni, 
1977). Another is the suggestion that the boundary between fricative and 
affricate on a rise-time continuum is the same as the rise-time boundary in 
the analogous r.onspeech case — that is, the boundary that separates the non- 
speech percepts "pluck" and "bow" (Cutting & Rosner, 197^; but see Rosen & Ho- 
well, 1981 ) . To account for the fact that such discontinuiti es move as a 
function of phonetic context or rate of articulation, one can ado the assump- 
tion that the several component3 of the acoustic signal give rise to interac- 
tions of a purely auditory sort (Hillenbrand, 198^; but see Summerf ield , 
1982). As for the rising and falling formant transitions of our earlier exam- 
ple, some such assumption of auditory interaction (between the transitions ^nd 
the remainder of the acoustic pattern) would presumably be offered to account 
for the fact that they sound like two different glissandi in isolation, but as 
the same ( non-glissando-like) consonant in the context of the acoustic syll- 
able. The clear implication of this theory is that, for all phonetic contexts 
and for every one of the many acoustic cues that are known to be of conse- 
quence for each phonetic segment, the motivation for articulatory and 
coart iculatory maneuvers is to produce just those acoustic patterns that fit 
the language-independent characteristics of the auditory system. Thus, this 
last auditory theory is auditory in two ways: speech perception is governed 
by auditory principles, and so, too, is speech production. 

The account provided by the motor theory . The motor theory offers a 
view radically different from the auditory theories, most obviously in the 
claim that speech perception is not to be explained by principles that apply 
to perception of sounds in general, but must rather be seen as a specializa- 
tion "or phonetic gestures. Incorporating a biologically based link between 
perception and production, this specialization prevents listeners from hearing 
the signal as an ordinary sound, but enables them to use the systematic, yet 
special, relation between signal and gesture to perceive the gesture. The re- 
lation is systematic because it results from lawful dependencies among ges- 
tures, articulator movements, vocal-tract shapes, and signal. It : 3 special 
because it occurs only in speech. 

Applying the motor theory to our example, we suggest what has seemed 
obvious since the importance of the transitions was dis;overed: the listener 
uses the systematically varying transitions as information about the coarticu- 
lation of an invariant consonant gesture witf various vowels, and so perceives 
this gesture. Perception requires no arbitrary association of signal with 
phonetic category, and no correspondingly arbitrary progression from an audi- 
tory stage (e.g., different sounding glissandi) to a superseding pnonetic la- 
bel. As Studdert-Kennedy (1976) has put it, the phonetic category "names it- 
self." 

By way of ccmpc*r- A 3on with the last of the auditory theories we described, 
we note that, just as this theory is in two ways auditory, the motor theory i 
in two v*ays motor. First, because it takes the proper object of phonetic 
perception to be a motor event. And, second, because it assumes that adapta- 
tions of the motor system for controlling the organs of the voca] tract took 
precedence in the evolution of speech. These adaptations made it possible, 
not only to produce phonetic gestures, but also to coarticulate them so that 
they could be produced rapidly, A perceiving system, specialized to take ac- 
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count of the complex acoustic consequences, developed concomitantly. Accord- 
ingly, the theory is not indif f erencly perceptual or motor, implying simply 
that the basis of articulation and the object of perception are the same. 
Rather, the emphasis is quite one-sided; therefore the theory fully deserves 
the epithet "motor. 11 

How the Motor Theory Makes Speech Perception Like Other Specialized 
Perceiving Systems. The specialized perceiving system that the motor theory 
assumes is not unique; it is, rather, one of a rather large class of special 
systems or "modules." Accordingly, one can think about it in familiar biolog- 
ical terms. Later, we will consider more specifically how the phonetic module 
fits the concept of modularity developed recently by Fodor (1983); our concern 
now is only to compare the phonetic module with other3. 

"he modules we refer to have in common that they are special neural 
structures, designed to take advantage of a systematic but unique relation be- 
tween a proximal display at tiie sense organ and some property of a distal ob- 
ject. A result in all cases is that there is not, first, a cognitive 
representation of the proximal pattern that is modality-gereral, followed by 
translation to a particular distal property; rather, perception of the distal 
property is immediate, *hich is to say that the module has done all the hard 
work. Consider auditory localization as an example. One of several cues is 
differences in time of arrival of particular frequency components of the sig- 
nal at the two ears (see Hafter, 1984, for a review). No one would claim that 
the use of this cue is part of the general auditory ability to perceive, as 
such, the size of the time interval that separates the onsets of two different 
signals. Certainly, this kind of general auditory ability does exist, but it 
is no part of auditory localization, either ^ychologically or physiological- 
ly. Animals perceive the location of soundi jbjects only by means of neural 
structures specialized to take advantage of ~ne systematic but special rela- 
tion between proximal stimulus and distal location (see, for example, Knudsen, 
1984). The relation is systematic for obvious reasons; it is special because 
it depends on the circumstance that the animal has two ears, and that the ears 
?re set a certain distance apart. In the case of the human, the only species 
for which the appropriate test can be made, there is no translation from per- 
Cv ved disparity in time because there is no perceived disparity. 

Compare this with the voicing distinction (e.g., [ba] vs. [pa]) referred 
to earlier, which is cued in part by a difference in time of onset of the sev- 
eral formants, and which has therefore been said by some tn rest on a general 
auditory ability to perceive temporal disparity as such (Kuhl & Miller, 1975; 
Pisoni, 1977). We believe, to the contrary, that the temporal disparity is 
only the proximal occasion for the unmediated perception of voicing, a distal 
gesture represented at the level of articulation by the relative timing of vo- 
cal-tract opening and start of laryngeal vibration (Lisker & Abramson, 1964). 
So we should vxpect perceptual judgments of differences in signal onset-time 
to have no more relevance to the voicing distinction than to auditory locali- 
zation. In neither case do general auditory principles and procedures 
enlighcen us. Nor does it help to invoke general principles of auditory 
interaction. The still more general principle that perception gives access to 
distal objects tells us only that auditory localization and speech perception 
work as they are supposed to; it does not tell us how. Purely the "how" is to 
be found, not by studying perception, even auditory perception, in general, 
but only by studying auditory localization and speech perception in particu- 
lar. Both are special systems; they are, therefore, to be understood only in 
their own terms. 
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Examples of such biologically specialized perceiving modules can be 
multiplied. Visual perception of depth by use of information about binocular 
disparity is a well-studied example that has the same general characteristics 
we have attributed to auditory localization and speech (Julesz, 1960, 1971; 
Poggio, 1984). And there is presumably much to be learned by comparison with 
such biologically coherent systems as those that underlie echolocation in bats 
(Suga, 1984) or song in birds (Marler, 1970; Thorpe, 1958). But we will not 
elaborate, for the point to be made here is only that, from a biological point 
of view, the assumptions of the motor theory are not bizarre. 

How the Motor Theory Makes Speech Perception Different from uther Spe - 
cialized Perceiving Systems , Perceptual modules, by definition, differ from 
one another in the classes of distal events that form their domains and in the 
relation between these events and the proximal displays. But the phonetic 
module differs from others in at least two further respects. 

Auditory and phonetic domains . The first difference is in the 
locale of the distal events. In auditory localization, the distal event is 
"out there," and the relation between it and the proximal display at the two 
ears is completely determined by the principles of physical acoustics. Much 
the same can be said of those specialized modules that deal with the 
primitives of auditory quality, however they are to be characterized, and that 
come into play when people perceive, for example, whistles, horns, breaking 
glass, and marking dogs. Not so for the perception of phonetic structure. 
There, the distal object is a phonetic gesture or, more explicitly, ar. 
"upstream" neural command for the gesture from which the peripheral articula- 
tor movements unfold. It follows that the relation between distal object and 
proximal stimulus will have the special feature that it is determined not just 
by acoustic principles but also by neuromuscular processes internal to the 
speaker. Of course, analogues of these processes are also available as part 
of the biological endowment of the listener. Hence, some kind of link between 
perception and production would seem to characterize the phonetic module, but 
not those modules that provide auditory localization or visual perception of 
depth. In a later section, we will have more to say about this link. Now we 
will only comment that it may conceivably resemble, in its most general 
characteristics, those links that have been identified in the communication 
modules of certain nonhuman creatures (Gerhardt & Rheinlaender f 1982; Hoy, 
Hahn, & Paul, 1977; Hoy & Paul, i973; Katz & Gurney, 1981; Marge' -h, 1983; 
McCasland & Konishi, 1983; Nottebohm, Stokes, & Leonard, 1976 illiams, 
198^1). 

The motor theory aside, it is pJnin that speech somehow informs listeners 
about the phonetic intentions of the talker. The particular claim of the mo- 
tor theory is that these intentions are represented in a specific form in the 
talker's brain, and that there is a perceiving module specialized to lead the 
listener effortless] y to that representation. Indeed, what is true of speech 
in this respect is true for all of language, except, of course, that the more 
(Ji3tal object for language is some representation of linguistic structure, not 
merely of gesture, and that access to this object requires a module that is 
not merely phonetic, but phonologic* j. and syntactic as well. 

Competition between phonetic and auditory modes . A second important 
difference between the phonetic nodule and the others has to do with the ques- 
tion: how does the module cooperate or compete with others that use stimuli 
of the same broadly def.ined physical form? For auditory localization, the key 
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to the answer is the fact that the module is turned on by a specific and 
readily specifiable characteristic of the proximal stimulus: a particular 
range of differences in time of arrival at the two ears. Obviously, such 
differences have no other utility for the perceiver but to provide information 
about the distal property, location there are no imaginable ecological cir- 
cumstances in which a person could use this characteristic of the proximal 
stimuli to specify some other distal property. Thus, the oroximal display and 
the distal property it specifies only complement the oth \r aspects of what a 
listener hears; they never compete. 

In phonetic perception, things are quite dil'ferer.t because important 
acoustic cues are often similar to, even identical with, tne stimuli that in- 
form listeners about a variety of nonspeech events. We have already remarked 
that, in isolation, formant transitions sound like glissandi or chirps. Now 
surely we don't want to perceive these as glissandi or chirps when we are 
listening to speech, but we do want to perceive them so when we are listening 
to music or to birdsong. If this is true for all of the speech cues, as in 
some sense it presumably is, then it is hard to see how the module can be 
turned on by acoustic stigmata of any kind — that is, by some set of necessary 
cues defined in purely acoustic terms. We will consider this matter in some 
greater detail later. For now, however, the point is only that cues known to 
be of great importance for phonetic events may be cues for totally unrelated 
nonphonetic events, too. A consequence is that, in contrast to the generally 
complementary relation of the several modules that serve the same broadly de- 
fined modality (e.g., depth and color in vision), the phonetic and auditory 
modules are in direct com* etition. (For a discussion of how this competition 
might bo resolved, see Mattingly & Liberman, 1985.) 



Having briefly described one motive for the motor theory— the con- 
text-conditioned variation in the acoustic cues for constant phonetic categor- 
ies — we will now add others. We will limii ourselves to the so-called segmen- 
tal aspects of phonetic structure, though the theory ought, in principle, to 
apply in the suprasegmental domain as well (cf. Fowler, 1982), 

The two parts of the theory — that gestures are the objects of perception 
and that perception of these gestures depends on a specialized module — might 
be taken to oe independent, as they were in their historical development, but 
the relevant data are not. We therefore cannot rationally apportion the data 
between the parts, but must rather take them as they come. 

A result of articulation : The multiplicity, variety, and equivalence of 
cues ~for each phonetic percept . When speech synthesis began to je used as a 
tool to investigate speech perception, it was soon discovered that, in any 
specific context, a particular local property of the acoustic signal was 
sufficient for the perception of one phonetic category rathe^ than another 
and, more generally, that the percept could be shifted along some phonetic di- 
mension by varying the synthetic stimulus along a locally-definable acoustic 
dimension. For example, if the onset frequency of the transition of the sec- 
ond formant during a stop release is sufficiently low, relative to the fre- 
quency of the following steady state, the stop is perceived as labial; other- 
wise, as apical or dorsal (Liberman et al., 195*0. A value along such an 
acoustic dimension that was optimal for a particular phonetic category, or, 
more loosely, the dimension itself, was termed an "acoustic cue." 
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Of course, the fact that particular acoustic cues can be isolated must, 
of itself, tell us something about speech perception, for it might have been 
otherwise. Thus, it is possible to imagine a speech-perception mechanism, 
equipped, perhaps, with auditory templates, that would break down if presented 
with anything ^ther than a wholly natural and phonetically opcimal stimulus. 
Listeners would either give conflicting and unreliable phonetic judgments or 
else not hear speech at all. Clearly, the actual mechanism is not of thio 
kind, and the concept of cue accords with this fact. 

Nevertheless, the emphasis on the cues has, perhaps, been unfortunate, 
for the term "cue" might seem to imply a claim about the elemental units of 
speech perception. But "cue" was simply a convenient bit of laboratory jargon 
referring to acoustic variables whose definition depended very much on the de- 
sign features of the particular synthesizers that were used to study them. 
The cues, as such, have no role in a theory of speech perception; they only 
describe some of the facts on which a theory might be based (cf. Bailey & 
Summer field, 1980). There are, indeed, several generalizations about the 
cues — 3cme only hinte ' at by the data now available, others quite well found- 
ed — that are relevant to such a theory. 

One such gener al^ation is chat every "potential" cue — that is, each of 
the many acoustic events peculiar to a linguistically significant gesture — is 
an actual cue. (For example, every one of eighteen potential cues to the 
voicing distinction in medial position has been shown to have some perceptual 
value; Lisker, 1978.) All possible cues have not been tested, and probably 
never will be, but no potential cue has yet been found that could not be shown 
to be an actual one. 

P closely related generalization is that, while each cue is, by -fini- 
tion, more or less sufficient, none is truly necessary. The absence >f any 
single cue, no matter how seemingly characteristic of the phonetic c© egory, 
can be compensated for by others, not without some cost **o naturalness or even 
intelligibility, perhaps, but still to such an extent that the intended cate- 
gory is, L: fact, perceived. Thus, stops can be perceived without silent pe- 
riods, fricatives without frication, vowels without formants, and tones with- 
out pitch (Abramson, 1972; inoue, 1984; Remez & Rubin, 1984; Repp, 1984; 
Yeni-Komshian & Soli, 1981). 

Yet another generalization is that even when several cues are present, 
variations in one can, within limits, oe compensated for by offsetting varia- 
tions in another (Dorman, Raphael, & Liberman, 1979; Dorman, Studdert-Kennedy , 
& Raphael, 1977; Hoffman, 1958; Howell & Rosen, 1983; Lisker, 1957; Summer- 
field & Haggard, 1977). In the case of the contrast between fricative-vowel 
and fricative-stop-vowel (as in [3a] vs. [sta]), investigators have found that 
two important cues, silence and appropriate formant transitions, engage in 
just such a trading relation. That this bespeaks a true equivalence in 
perception was shown by experiments in which the effect of variation in one 
cue could, depending on its "direction," be made to "add to" or "cancel out" 
the effect of the other (Fitch, Halwes, Erickson, & Liberman, 1980). Signif- 
icantly, this effect can also be obtained with sine-wave analogues of speech, 
but only for subjects who perceive these signals as speech, not for those who 
perceive them as nonspeech tones (Beat, Morrongiello, & Robson, 1981). 
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Putting together all the generalizations about the multiplicity and vari- 
ety of acoustic cues, we should conclude that there is simply no way to defne 
a phonetic category in purely acoustic terms. A complete list of the 
cues— surely a cumbersome matter at best—is not feasible, for it would 
necessarily include all the acoustic effects cf phonetically distinctive 
articulations. But even if it were possible to compile such a list, the re- 
sult would not repay the effort, because none of the cues on the list could be 
deemed truly essential. As for those cues that might, for any reason, be fi- 
nally included, none could be assigned a characteristic setting, since the ef- 
fect of changing it could be offset by appropriate changes in one ov more of 
the others. This surely tells us something about the design of the phonetic 
module. For if phonetic categories were acoustic patterns, and if, according- 
ly, phonetic perception were properly auditory, one should be able to describe 
quite straightforwardly the acoustic basis for the phonetic category and 
associated percept. According to the motor theory, by contrast, one would ex- 
pect the acoustic signal to serve only as a source of information about the 
gestures; hence the gestures would properly define the category. As for the 
perceptual equivalence among diverse cues that is shown by the trading rela- 
tions, explaining that on auditory grounds requires ad hoc assumptions. But 
if, as the motor theory would have it, the gesture is the distal object of 
perception, we should not wonder that the several sources of information about 
it are perceptually equivalent, for they are products of the sane linguisti- 
cally significant gesture. 

A result of coartlculatlon t I. Segmentation in sound and percept . 
Traditional phonetic transcription represents utterances as single linear se- 
quences of symbols, each of which stanis for a phonetic category. It is an 
issue among phono logists whether such transcriptions are really theoretically 
adequate, and various alternative proposals have been made in an effort to 
provide a better account. This matter need not concern us here, however 1 , 
since all proposals have in ccmmon that phonetic units of some description are 
ordered from left to right. Seme sort of segmentation is thus always implied, 
and what theory must take into account is that the perceived phonetic object 
is thus segmented. 

Segmentation of the phonetic percept would be no problem for theory if 
the proximal sound were segmented correspondingly. But it is not, nor can it 
De, if speech is to be produced and perceived efficiently. To maintain a 
straightforward relation in segmentation between phonetic unit and signal 
would require that the sets of phonetic gestures corresponding to phonetic 
units be produced one at a time, cc^b in its turn. The obvious consequence 
would be that each unit would beccme a syllable, in which case talkers could 
speak on'y as fast as they could spell. A function of coarticulation i3 to 
evade this limitation. T^ere is an important consequence, however, which is 
that there is now no straightforward correspondence in segmentation between 
the phonetic and acoustic representations of the information (Fant, 1962; 
Jooo, 1948). Thus, the acoustic information for any particular phonetic unit 
i) typically overlapped, often quite thoroughly, with information for other 
units. Moreover, the span over which that information extends, the amount of 
overlap, and the number of units signalled within the overlapped portion all 
vary according to the phonetic context, the rate of articulation, and the lan- 
guage (Magen, 1984; Manuel & Krakow, 1984; Ohman, 1966; Recasens, V.,84; Repp, 
Lioerman, Eccardt, & Pesetsky, 1978; Tuller, Harris, & Kel*o, 1982). 
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There are, perhaps, occasional stretches of the acoustic signal over 
which there is information about only one phonetic unit — for example, in the 
middle of the frication in a slowly articulated fricative-vowel syllable and 
in vowels that are sustained /or artificially long times. Such stretches do, 
of course, offer a relation between acoustic patterns and phonetic units thac 
would be transparent if phonetic perception were merely auditory. But even in 
these cases, the listener automatically takes account of, not just the trans- 
parent part of the signal, but the regions of overlap as well (Mann & Repp, 
1980, 1981; Whalen, 1981). Indeed, the general rule may be that the phonetic 
percept is normally made available to consciousness only after all the rele- 
vant acoustic information is in, even when earlier cues might have beer suffi- 
cient (Martin & Bunell, 1 981, 1 982; Repp et al.. 1 978 ). 

What wants explanation, then, is that the percept is segmented in a way 
that the signal is not, or, to put it another way, that the percept does not 
mirror the overlap of information in the sound icf. Fowler, 1984). The motor 
theory does not provide a complete explanation, certainly not in its present 
state, but it does head the theoretical enterprise in the right direction. At 
the very least, it turns the theorist away from the search for those unlikely 
processes that an auditory theory would suggest: How listeners learn phonetic 
labels for what they hear and thus re-interpret perceived overlap as sequences 
of discrete units; or how discrete units emerge in perception from interac- 
tions of a purely auditory sort. The first process seems implausible on its 
face, the second because it presupposes that the function of the many kinds 
and degrees of coarticulation is to produce just those combinations of sounds 
that will interact in accordance with language-independent characteristics of 
the auditory system. In contrast, the motor theory begins with the assumption 
that coarticulation, and the resulting overlap of phonetic information in the 
acoustic pattern, is a consequence of the efficient procesr-es by wh^ch dis- 
crete phonetic gestures are realized in the behavior of more or less indepen- 
dent articulators. The theory suggests, then, that an equally efficient 
perceptual process might use the resulting acoustic pattern to recover the 
discrete gestures . 

A result of coarticulation : II. Different sounds, different contexts, 
same percept . That the phonetic percept is invariant even when the relevant 
acoustic cue is not was the characteristic relation between percent and sound 
that we took as an example in the f.'rst section. There, we observed that 
variation ir the acout,' ic pattern results from overlapping of putatively 
invariant gestures, an observation that, as we remarked, points to the ges- 
ture, rather than the acoustic pattern itself, as the object of perception. 
We now add that the articulatory variation due to context is pervasive: in 
the acoustic representation of every phone cic category yet studied there are 
context-conditioned portions that contribute to perception and that must, 
therefore, be taken into account by theory. Thus, for stops, nasals, frica- 
tives, liquids, semivowels, and vowels, the always context-sensitive transi- 
tions are cues (Harris, 1958; Jenkins, Strange, & Edman, 1983; Liberman et 
al., 1954; O'Connor, Gerstman, Liberman, Delattre, & Cooper, 1957; Strange, 
Jenkins, & Johnson, 1983). For steps cuid fricatives, the noises that are pro- 
duced at the point of constriction are also known to be cues, and, under some 
circumstances at lease, Inese, loo, vary with context (Dor-man et al., 1977; 
Liberman et al., 1952; Whalen, 1981). 
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An auditory theory that accounts for invariant perception in the face of 
so much variation in the signal would require a long list of apparently arbi- 
trary assumptions. For a motor theory, on the other hand, systematic stimulus 
variation is not an obstacle to be circumvented or overcome in some arbitrary 
way; it is, rather, a source of information about articulation that provides 
important guidance to the perceptual process in Getermining a representation 
of the distal gesture. 

A result of coarticulation : III . Same sound, different contexts, dif - 
ferent percep ts . When phonetic categorieo share one feature but differ in an- 
other, the relation between acoustic pattern and percept speaks, again, to the 
motor theory and its alternatives. Consider, once more, the fricative [s] and 
the stop [t] in the syllables [sa] and [sta]. In synthesis, the second- and 
third-formant transitions can be the same for these two categories, since they 
have the same place of articulation; and the first- formant transition, normal- 
ly a cue to manner, can be made ambiguous between them. For such stimuli, the 
perception of [sta] rather than [sa] depend3 on whether there is an interval 
of silence between the noise for the [s] and the onsets of the transitions. 

Data relevant to an interpretation of the role of silence in thus produc- 
ing different percepts from the same transition come frcm two kinds of experi- 
ments. First are those that demonstrate the effectiveness of the transitions 
as cues for the place feature of the fricative in fricative-vowel syllables 
(Harris, 1958). The transitions are not, therefore, masked by the noise of 
the [s] friction, and thus the function of 3ilence in a stop is not, as it 
might be in an auditory theory, to protect the transitions from such masking. 
The second kind of experiment deals with the possibility of a purely auditory 
interaction — in this case, between silence and the formant transitions. Among 
the findings that make such auditory interaction seem unlikely is that silence 
affects perception of the formant transitions differently in and out of speech 
context and, further, that the effectiveness of silence depends on such fac- 
tors as continuity of talker and prosody (Dorman et al. ( 1979; Rakerd, Decho- 
vitz, & Verbrugge, 1982). But perhaps the most direct test for auditory 
interaction is provided by experiments in which such interaction is ruled out 
by holding the acoustic context constant. This can be done by exploiting "du- 
plex perception," a phenomenon to be discussed in greater detail in the next 
section. Here it is appropriate to say only that duplex perception provides a 
way of presenting acoustic patterns so that, in a fixed context, listeners 
hear the same second- or third-formant transitions in two phenomenally differ- 
ent ways simultaneously: as nonspeech chirps and as cues for phonetic cate- 
gories. The finding is that the presence or absence of silence determines 
whether formant transitions appropriate for [t] or for [p], for example, are 
integrated into percepts as different as stops an-* fricatives; but silence has 
no effect on the perception of the nonspeech chirps that these same transi- 
tions produce (Liberman, Isenberg, 4 Rakerd, 1981). Since the latter result 
eliminates the possibility of auditory interaction, we are left with the ac- 
count that the motor theory would suggest: that silence acts in the special- 
ized phonetic mode to inform the listener that the vocal tract was completely 
closed to produce a stop consonant, rather than merely constricted to produce 
a fricative. It follows, then, that silence will, by its presence or absence, 
riptprmine whether identical transitions are cues in percepts that belong to 
the one manner or the other. 
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An acoustic signal diverges to phonetic and auditory modes . We noted 
earlier that a formant transition is perceptually very different depending on 
whether it is perceived in the auditory mode, where it sounds like a chirp, or 
in the phonetic mode, where it cues a "nonch irpy " consonant. Of course, the 
comparison is not entirely fair, since acoustic conlext is nut controlled: 
the transition is presented in isolation in the one case, out as an element of 
a larger acoustic pattern in the other. We should, tnerefore, call attention 
to the fact that the same perceptual difference is obtained even when, by re- 
sort to a special procedure, acoustic context is held constant (Liberman, 
1979; Rand, 197*0. This procedure, which produces the duplex percept referred 
to earlier, goes as follows. All of an acoustic syllable except only the for- 
mant transition that decides between, for example, [da] and [ga] is presented 
to one ear . By itself, this pattern, called the "base, 11 sounds like a 
stop-vowel syllable, ambiguous between [da] and [ga]. To the other ear is 
presented one or the other of the transitions appropriate for [d] or [g]. In 
isolation, these sound like different chirps. Yet, when base anG transition 
are presented dichotically , and in the appropriate temporal relationship, they 
give rise to a duplex percept: [da] or [ga], depending on the transition, 
and, simultaneously, the appropriate chirp. (The fused syllable appears to be 
in the ear to which the base had beer, presented, the chirp in the other.) 

Two related characteristics of duplex perception must be emphasized. One 
is that it is obtained only when the stimulus presented to one ear is, like 
the "chirpy" transition, of short duration and extremely unspeechlike in 
quality. If that condition is not met, as, for example, when the first two 
formants are presented to one ear and the entire third formant to the other, 
perception *s not duplex. It is, on the contrary, simplex; one hears l coher- 
ent syllable in which the separate components cannot be apprehended. (A very 
different result is obtained when two components of a musical chord are 
presented to one ear, a third component to the other. In that case, listeners 
can respond to the third component by itself and also to that component com- 
bined with the first two [Pastore, Schmuckler, Rosenblum, & Szczesiul, 1983].) 

The other, closely relates characteristic of duplex perception is that it 
is precisely duplex, not triplex. That is, listeners perceive ^he nonspeech 
chirp and the fused syllable, out they do not also perceive the base— i.e., 
the syllable, minus one of the formant transitions — that was presented to one 
ear (Repp, Milburn, & Ashkenas, 1983). (In the experiment with musical chords 
by Pastore et al. , referred to just above, there was no test for duplex, 
distinguished from triplex, perception.) 

The point is that duplex perception docs not simply reflect ability 
of the auditory system to fuse dichotically presented stimuli ard also, as in 
the experiment with the chords, to keep tliem apart. Rather, the duplex 
percepts of speech comprise the only two ways in which the transition, for 
example, can be nec^d: as a cue for t. phonetic gesture and as a nonspeech 
sound. Tnece r rcepts are strikingly different, and, as we have already seen, 
they change i different, sometimes contrasting ways in response to variations 
in the aeoi^tic signals — variations that must have been available to all 
structures in the b' in that can process auditory information. A reasonable 
conclusion \*\ that there must b<? two modules that can somehow use the same in- 
put to produce siraltaneous representations of two distal objects. (For 
speculation about che mechanism that normally prevents perception of this eco- 
logically irrj:os3iole situation, and about the reason why that highly adaptive 
mechanism might be defeated by thr procedures used to produce duplex percep- 
tion, see Mattingly & Liberman, 1985.) 
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Acoustic and optical signals Converge on the phonetic mode . In duplex 
perception, a single acoustic stimulus is processed simultaneously by the 
phonetic and auditory modules to produce perception of two distal objects: a 
phonetic gesture and a sound. In the phenomenon to which we turn now, some- 
thing like the opposite occurs: two different stimuli — one acoustic, the oth- 
er optical--are combined by the phonetic module to produce coherent perception 
of a single distal event. This phenomenon, discovered by McGurk and McDonald 
(1976), can be illustrated by this variant on their original demonstration. 
Subjects are presented acoustically with the syllables Tba], [ba] , [ba] and 
optically with a face that, in approximate synchrony, silently articulates 
[be], [ve], [3e]. The resulting and compelling percept is [ba], [va], [3a], 
with no awareness that it is in any sense bimodal--that is, part auditory and 
part visual. According to the motor theory, this is so because the perceived 
event is neither; it is, rather, a gesture. The proximal acoustic signal and 
the proximal optical signal have in common, then, that they convey information 
about the same distal object. (Perhaps a similar convergence is implied by 
the finding that units in the optic tectum of the barn owl are bimodall} sen- 
sitive to acoustic and optical cues for the same distal property, locacion in 
space; Knudsen, 1982). 

Even prelinguistic infants seem to have some appreciation of the relation 
between the acoustic and optical consequences of phonetic articulation. This 
is to be inferred from an experiment in which it was found that infants at 
four to five months of age preferred to look at a face that articulated the 
vowel they were hearing rather than at the same face articulating a different 
vwel (Kuhl & Meltzoff, 1982). Significantly, this result was not obtained 
wh-*n the sources were pure tones matched in amplitude and duration to the vow- 
els. In a related study it was found that infants of a similar age looked 
longer at a face repeating the disyllable they were hearing than at the same 
face repeating another disyllable, though both disyllables were carefully syn- 
chronized with the visible articulation (MacKain, Studdert" -Kennedy, Spieker, & 
Stern, 1983). Like the results obtained with adults in the McGurk-MacDonald 
kind of experiment, these findings with infants imply a perception-production 
link and, accord inply, a common mode of perception for all proper information 
about the gesture. 

The general characteristics that cause acoustic signals to be perceived 
as speech . The point was made in an earlier section that acoustic definitions 
of phonetic contrasts are, in the end, unsatisfactory. Now we would suggest 
that acoustic definitions also fail for the purpose of distinguishing in gen- 
eral ^tween acoustic patterns that convey phonetic structures and those that 
do not. Thus, speech cannot be distingui3hed from ronspeech by appeal to sur- 
face properties of the sound. Surely, natural speech does have certain 
characteristic? of a general and superficial sort — for example, formants with 
character istic bandwidths and relative intensities, stretches of wave^orf. 
periodicities that typically mark the voiced portion of syllables, pt, ;s of 
intensity corresponding approximately to syllabic rhythm, etc. — and thes^ can 
be used by machines to detect speech. But research vith synthesizers has 
shown that speech is perceived even when such general characteristics are ab- 
sent. This was certainly true in the case of many of the acoustic patterns 
tnat were used in work with the Pattern Playback syntnesizer, and more recent- 
ly it has bv'en shown to be true in the most extreme case of patterns consist- 
ing only of sine waves that follow natural formant trajectories (Remez, Rubin, 
Pisoni, & Carreli, 1981). Significantly, tne converse effect is also ob- 
tained. When reasonably normal formants are nade to deviate Into acoustically 
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continuous but abnormal trajectories, the percept breaks into two categorical- 
ly distinct parts: speech and a background of chirps, glissandi, and assorted 
noises (Liberman & Studdert -Kennedy , 1978). Of course, the trajectories of 
the formants are determined by the movements of the articulators. Evidently, 
those trajectories that conform to possible articulations engage the phonetic 
module; all others fail. 

We conclude that acoustic . tterns are identified as speech by reference 
to deep properties of a linguistic sort: if a sound can be "interpreted" by 
the specialized pnonetic module as the . esu]t of linguistically sign if i cant 
gestures, then it is speech; otherwise, not. (In much the same way, grammati- 
cal sentences can be distinguished from ungrammatical ones, not by lists of 
sur Tzce properties but only by determining whether or not a grammatical 
derivation can be given.) Of course, the kind of mechanism snoh an "interpre- 
tation" requires is the kind of mechanism the motor thee r precum^s 

Phonetic ana auditor y * esponses to the cues . odiously > a module that 
acts on acoustic signa 1 ^ cannot respond beyond the physiological limits of 
those parts of the auditory system that transmit the Jignal to the module. 
Within those limits, however, different modules can be sensitive to the sig- 
nals in different wavs. Thus, the auditory-localization module enables 
listerias to perceive differences in the position cf soundinjp objects given 
temporal disparity cues smaller by several orders of magnituut- than those re- 
quired to make the listener aware of temporal disparity as such (Brown & 
Deffenbacher, 1979, Chap. 7; Hirsh, 1959;. If there is, as the mr^cr theory 
implies, a distinct nhonetic module, then in like manner its sensitivities 
ohould not, except by accident, be the same as those that characterize the 
module that deals with the sounds of nonspeech events. 

In this connection, we noted in the first section of che paper that one 
form of auditory theory of speech perception points tc auditory 
discontinuities in differential sensitivity (or in absolute identification), 
taking these to be the rdtural bases for the perceptual discontinuities that 
characterize the boundaries of phonetic categories. But several kinds of 
experiments strongly imply that this is not so. 

One kind of e:.periment has provided '-vider/e that the perceptual 
discontinuities at the boundaries of phoreti • itegories are not fixed; rath- 
er, they move in accordance with the acousr-ic consequence of articulatory 
adjustments associated with phonetic context, dialect, ana rate of speech. 
(For a review, se* Repp & Liberman, in press.) To account for such articul: - 
tion-corr Blated changes in perceptual sensitivities by appeal to aud tory pro- 
ces"*s requires, yet aga' , an ultimately countless set of ad hoc assumptions 
about auditory interactions, as well as the implausible assumption that the 
articulators are always able to behave sc as to produce just those sounds that 
conform to the mangold and complex requirements that the auditory interac- 
tions impose. It seems hardly more plausible that, is has been suggested, the 
discontinuities in phonetic perception are rp^lly auditory discontinuities 
that were caused to move about in phylogenetic o. ontogenetic development as a 
result of experience with sneech (Aslin & Pisuni, r 980 ) . The difficulty with 
this assumption is that i- presupposes the very canonical form of the cues 
that does not exist (sec above) and, also, that it implies a contradiction in 
assuming, as it must, that the auditory sensiti* xties underwent changes in the 
development of speeoh, yet somehow ^lso remained unchanged and nonetheless 
manifest in tlrr ult's perception of nonspeecli sounds. 
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Perhaps this is the niace to remark about categorical perception that the 
issue is not, as is often supposed, whother nonspeech continue re categori- 
cally percei/ea, for surely some do show tendencies in that ax* sction. T^e 
issue is whether, given the same (or similar) acoustic continua, the auditory 
and phonetic boundaries are in the same place* If there aie, indeed, auditory 
boundaries, and if, further, these boundaries are replaced in phonetic percep- 
tion by boundaries at different locations (as the experiments referred to 
above do ini'cate), then the separateness of phonetic and auditory perception 
is even more strongly argued for than if the phonetic boundaries had apoearetf 
on continue where auditory boundaries did not also exist, 

also relevant to comparison of sensitivity in phonetic and auditory modes 
are experiments on perception of acoustic variations when , in the one case, 
they are cues for phonetic distinctions, and when, in some other, they are 
perceived as nonspeech. One of the earliest of the experiments to provide da- 
ta about the nonspeech side of this comparison dealt with perceptio. of fre- 
quency-modulate^ tones — or Vamps" as they were called — that bear a close 
resemblance to the formant transitions. The finding was that listeners are 
considerabiy better at perceiving the pitch at the end of the ramp than at the 
beginning (Brady, House, & Stevens, 196 1 ). Yet, in the case of stop conso- 
nants that are cued by formant transitions, perception is better syllaMe-ini- 
t'aily than &/12able-f inally , though in the former case it requires informa- 
tion about the beginning of the ramp, while in the latter it needs to know 
about the end. Thus, if one were predicting sensitivity ,o speech from 
sensitivity to th*2 analogous nonspeech sownds, one would ma^e exactly the 
wrong predictions. More recent studies have made more direct comparisons and 
founa dif"ere^ :-s in discrimination functions when, in speech context, formant 
transition 2 cue. place distinctions among stops and liquids, and when, in iso- 
lation, che same transitions were perceived as nonspeech sounds (Mattingly et 
al. f 1971; Miyawaki, Strange, Verbrugge, Liberman, ,3nkiris, & Fujimura, 197^). 

More impressive, perhaps, i<* evidence that has come from experiments in 
vhioh listeners are induced to perceive a constant stimulus in different ways, 
(ere belong experiments in which sine-wave analogues of speech, referred to 
earlier, are presented under conditions that cause some listeners to perceive 
them as speech and others not. The perceived discontiP" '.ties lie at different 
p^ces (on the acoustic continuum) for the two groups (Best et al., 1981; Best 
& Studdert -Kennedy, 1983; Studdert -Kenned., & Williams, !984; Williams, 
Verbrugs<", & Studder t-Kennedy , 1983). K^re, too, belongs an experiment in 
which the formant-transitions appropriate to a place contrast between Jtop 
consonants are presented with t>* remainder of a syllable in such a way as to 
produce tne duplex percept reft. . ed to earlier: the transitions cue a stop 
consonant and, simultaneously, nonspeech chirps. The result is that listeners 
yield quite different discrimination functions for exactly the same ^ornant 
transitions in exactly the same acoustic context, depending on whe ler they 
are responding to the speech or nonspeech sides of the duplex percept; only on 
the speech side of tne percept is there a peak in the discrimination function 
to mark a perceptual discontinuity at the phonetic boundary (Mann & Liberman, 
1983). 

Finally, we note that, apart from differences in differential sensitivity 
to the transitions, there is also a difference in absolute-threshold 
sensitivity when, in the one case, these transitions support a phonetic per- 
cept, and when, in the other, they are perceived as nonspeech chirps. 
Exploiting, again, the phenomenon of duplex perception, investigators found 
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that the transitions were effective (on the speech side of the percept) in 
cueing the contrast between stops at a level of intensity 13 db lower than 
that required for comparable discrimination of the chirps (Dentin & Mann, 
1983 ) . At that level, indeed, listeners could not even hear the chirps, let 
alone discriminate them; >et they could still use the transitions to identify 
the several stops. 

The Several Aspects of the Theory 

For the purpose of evaluating the motor theory, it is important to sepa- 
rate it into its more or le3s independent parts. First, and fundamentally, 
there is the claim that phonetic perception is perception of gesture. As we 
have seen, this claim i based on evidence that the invariant source of the 
phonetic percept is somewhere in the processes by which the sounds of speech 
are produced. In the first part of this section we will consider where in 
those processes the invariant might be found. 

The motor theory also implies a tight link between perception and produc- 
tion. *n the second part of this section we will a3k how th°t link came to 
be. 

Wher e is the Invariant Phoned tc Gesture ? A phonetic gesture, as we ha/e 
construed it, is a class of movements by one or more articulators that results 
in a particular, linguistically significant deformation, over time, of the vo- 
cal-tract configuration. The linguistic function of the gesture is clear 
enough: phonetic contrasts, which are of course the ba3is of phonological 
categories, depend on the choice of one particular gesture ratner than anoth- 
er. What is not so clear is how the gesture relates to the actual physical 
movent... j of articulators and to the resulting vocal-tract configurations, ob- 
served, for example, in x-ray films. 

In the early days of the motor theory we made a simplifying assumption 
about this relation: that a gesture was effected by a single key articulator. 
On this assumption, the actual movement trajectory of the articuL lor might 
vary, but only because of aerodynamic factors and the physical linka^ of thi3 
articulator with others, so the neural commands in the final common paths 
(observable with electromyographic techniques) would nevertheless be invariant 
across different contexts. This assumption was appropriate as an initial 
working hypothesis, if only because it was directly testable. In the event, 
there proved be a considerable amount of variability that the hypothesis could 
not account for. 

In formulating this initial hypothesis, we had overlooked several serious 
complications. One is that a particular gesture typically involves not just 
one articulate , but two or more ; thus "lip rounding, 11 for example , is a 
collaborate, of lower lip, upper lip, and jaw. Another is thp t a single 
articulator may narticipate in the execution of two different gestures at the 
same time; thus, the lips may be simultaneously rounding and closing in the 
productior Df a labial stop followed by a rounded vo ,±, e.g., [bu]. Prosody 
makes additional complicating demands, as when a greater displacement of some 
cr all of the active articulators is required in producing a stressed syllable 
rather than an unstressed one; and linguistically irrelevant factors, notably 
speaking rate, affect the trajectory and phasing of the component movements. 



ERLC 



85 



79 



Liberman & Mattingly: Tne Motor Theory of Speech Perception Revised 



These complications might suggest that there is little hope of providing 
a rigorous physical definition of a particular gesture, and that the gestures 
are hardly more satisfactory as perceptual primitives than are the acoustic 
cues. It might, indeed, be argued that there is ?n infinite number of possi- 
ble articulatory movements, and that the basis tr- categorizing one group of 
suuh movements as "lip rounding" and another ?. "lip closure" is entirely a 
pKori . 

But the case for the gesture is by no m^ans as weak as this. Th ugh we 
have a great deal to leai n before we can account for the variation in in- 
stances of the same gesture, it is nonetheless clear that, despite su^i varia- 
tion, the gestures have a virtue that the acoustic cues lack: instances of a 
particular gesture always have certain topological properties not shared by 
any other gesture. That is, for any particular gesture, the same sort of dis- 
tinctive ceformation is imposed on the current vocal-tract configuration, 
whatever this "underlying" conf iRuration happens to be. Thus, in lip round- 
ing, the lips are always slowly protruded and approximated to some appreciable 
extent, so that the anterior end of the vocal tract is extended and narrowed, 
though the relative contributions of the tongue and lips, the actual degrees 
of protrusion and approximation, and the speed of articulatory movement vary 
according to context. Perhaps this example seems obvious because lip rounding 
involves a local deformation of the vocal-tract configuration, but the gener- 
alization also applies to more global gestures. Consider, for example, the 
gesture required to prortjce an "open" vowel. In this gesture, tongue, lips, 
jaw, and hyoid all participate to contextually varying degrees, and the actual 
distance between the two lips, as well as tl^at between the tongue blade and 
bcdy and the upper surfaces of the vocal tract, are variable; but the goal is 
always to give the tract a more open, horn-3haped configuration than it would 
otherwise have had. 

We have pointed out repeatedly that, as a consequence of gestural 
overlapping, the invariant properties of a particular gesture are not manifest 
in the spectrum of the speech signal. We would now caution that a further 
consequence of this overlapping is that, because of their essentially topolog- 
ical character, the gestural invariants are usually not obvious frcm inspec- 
tion of a single static vocal-tract configuration, either. They emerge only 
from consideration of the configuration as it changes over time, and from 
comparison with other configurations in which the same gesture occurs in dif- 
ferent contexts, or different gestures in the same context. 

We would argue, then, that the gestures do have characteristic invariant 
properties, as the motor theory requires, though these must be seon, not as 
peripheral movements, but as the more remote structures that control the move- 
ments. These structure^ correspond to tne speaker's intentions. What is far 
from being understood is the nature of the systen: that computes the topologi- 
cal^ appropriate version of a gesture in a particular context. But this 
problem is not peculiar to the motor theory; it is familiar to many who study 
the control and coordination of movement, for they, like us, must consider 
whether, given context-conditioned variability at the surface, motor acts are 
nevertheless governed by invariants of some sort (Browm^n & Goldstein, 1985: 
Fowler, Rubin, Remez, & Turvey, 1980; Taller & Kelso, 198s; Turvey, 1977). 

The Origin of the Perception-Production Link . In the earliesc accounts 
of the motor theory, we put considerable attention on the Tact that listeners 
not only perceive the speech signal but also produce it. TMs, together with 
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doctr'nal behaviorist considerations, lrd us to assume that the connection be- 
tween perception and production was r crmed as a wholly learned association, 
and that perceiving the gesture was a matter of picking up the sensory conse- 
quences of covert mimicry. On th< : view of the genesis of the percep- 
tion-production link, the distinguishing characteristic of speech is only that 
it provides the opportunity for the link to be established. Otherwise, ordi- 
nary principles of associative learning arc adequate to the task; no speciali- 
zation for language is required. 

But then such phenomena as have been described in this paper were discov- 
ered, and it became apparent that they differed from anything that association 
learning could reasonably be expected to produce. Nor were these the only 
relevant considerations. Thus, we learned that people who have been patholog- 
ically incapable from birth of controlling their articulators are nonetheless 
able to pereeiye speech (MacNeilage, Rootes, & Chase, 1967). From the re- 
search pioneered by Eimas, Siqueland, Jusczyk, ana Vigorito (1971), we also 
learned that prelinguisti J infants apparently categorize phonetic distinctions 
much as adults do. More recently, we have seen that even when the distinction 
is not functional in the native language of the subjects, and when, according- 
ly, adults have trouble perceiving it, infants nevertheless do quite well up 
to about one year of age, at which time they begin to perform as poorly as 
adults (Werker & Tees, 1984). Perhaps, then, the sensitivity of infants to 
the ecoustic consequences of linguistic gestures includes all those gestures 
that could be phonetically significant in any language, acquisition of one's 
native language being a process of losing sensiti ty to gestures it doen not 
use. Taking such further considerations as these in o account, we have become 
even more strongly persuaded that the phonetic mode, and the percep- 
tion-production link it incorporates, are innately specified. 

Seen, then, as a view about tne biology of language, rather than a com- 
ment on the coincidence of speaking and listening, the motor theory bears at 
several pcirts on our thinking about the Jevelopment of speech perception in 
the child. Consider, first, a linguistic ability that, though seldom noted 
(but see Mattingly, 1976), must be taken as an important prerequisite to 
acquiring the phonology of a language. This is the ability to sort acoustic 
patterns irto two classes: those that contain (candidate) phonetic structures 
and those that do not. (For evidence, however indirect, that infants do so 
sort, see Alegria & Noirot, 1982; Best, Hoffman, & Glanville, 1982; Entus, 
1977; Molfese, Freeman, & Palermo, 1975; Segalowitz & Chapman, 1980; Witelson, 
1977; but see Vargha-Khadem & Corballis, 1979). To appreciate the bearing of 
the motor theory on this matter, recall our claim, made in an earlier section, 
that phonetic objects cannot be perceived as a class by reference to acoustic 
stigmata, but only by a recognition that the sounds might have been produced 
by a vocal tract as it made linguistically significant gestures. If bo, the 
perception-production link is a necessary condition for recognizing speech as 
speech. It would thus be a blow to the motor theory if it could be shown that 
infants must develop empirical criteria for this purpose. Fortunately for the 
theory, such criteria appear to be unnecessary. 

Consider, too, how the child comes to know, not only that phonetic struc- 
tures are present, but, more specifically, just what those phonetic structures 
are. In this connection, recall that information about the string of phonetic 
segments is overlapped in the sound, and that there a<*o, accordingly, no 
acoustic boundaries. Until and unless the child (tacitly) appreciates the 
gestural source of the sounds, it can hardly be expected to perceive, or ever 
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learn to perceive, a phonetic structure. Recall, too, that the acoustic curs 
for a phonetic category vary with phonetic factors such as context and with 
extra-phonetic factors such as rate and vocal-tract size. This is to say, 
once again, that there is no canoi 'cal cue. What, then, is the child to 
learn? Association of sonu particular cue (or set of cues) with a phonetic 
category will work only for a particular c. cumstance. When circumstances 
change, the child's identification of the category will be wrong, sometim3S 
grossly, and it is hard to see how it could readily make the appropriate 
correction. Perception of the phonetic categories can properly be generalized 
only if the acoustic patterns are taken for what they really are: information 
about the underlying gestures. No matter that the child oometimes mistakes 
the phonological significance of the gesture, so long as that which is per- 
ceived captures the systematic nature of its relation to the sound ; the 
phonology will come in due course. To appreciate this relation L Q , once 
again, to make use of the link between perception anC production. 

How "Direct" is Speech Perception ? 



Since we have been arguing that sp.<?ch perception is accomplished without 
cognitive translation from a first-stage auditory register, our position might 
appear similar to the one Gibson (1 966) has taken in regard to "direct percep- 
tion." The similarity tc Gibson's views may seem ail the greater because, like 
him, we believe that the object of perception is motoric. But there are im- 
portant differences, the bases for which are to be seen in the following 
passage (Gibson, 1966, p. 94): 

An articulated utterance is a source of a vibratory field in the 
air. The source is biologically 'physical' and the vibration is 
acoustically 'physical'. The vibration is a potential stimulus, 
becoming effective when a listener is within range of the vibratory 
field. The listener then perceives the articulation because the 
invariants of vibration correspond to thnse of articulation. In 
this theory of speech perception, the unita and parts of speech are 
present both in the mouth of the speaker and in the air bet-ween the 
speaker *r;d listener. Phonemes are in the air. They can be consid- 
ered physically real if the higher-order invariants of sound waves 
are admitted to the realm of physics. 

The firs* deference between Gibson's view and ours relates to the nature 
of the perceived events . For Gibson, these are actual movements of the 
articulators, wnixe fo»* us, they are the more remote gestures that the speaker 
intended. The distinction would be trivial if an articulator were affected by 
only one ges^ire at a time, but, as we have several times remarked, an articu- 
lator movement is usually the result of two or icre overlapping gestures. 
The gestures are thus control structures for the observable movements. 

The second difference is chat, unlike Gibson, we do n<£ think articulato- 
ry movementa (let alone phonetic structures) are give r » directly (that is, 
without computation) by "higher-crder invariants" that would be plain if only 
we had a biologically appropriate science of physical acoustics. We would 
certainly welcome any demonstration ':hat sucn invariants <iid exist, since, 
iven though articulatory movement is net equivalent to nhonetic structure, 
t,uch a demonstration would permit a simpler account of how the ph netic module 
works. But no higher-order invariants have thus far beer proposed, and we 
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doubt ,hat any will be forthcoming. We would be more optimistic on this score 
if it could be shown, at least, that articulatory movements can be recovered 
from the signal by computations that are pureJy analytic, if nevertheless com- 
plex. One might then hope to reformulate the relationship between movements 
and signal in a way that would make it possible to appeal to higher-order 
invariants and thus obviate the need for computation. But, given the 
many-to-one relation between vocal-tract configurations and acoustic signal, a 
purely analytic solution to the problem of recovering movements from the sig- 
nal seems to be impossible unless one makes unrealistic assumptions about 
excitation, damping, and other physical variables (Sondhi, 1979). We there- 
fore remain skeptical about higher-order invariants. 

The alternative to an analytic account cf speech perception in, of 
course, a synthetic one, in which case the module compares some parametric 
description of the input signal with candidate signal descriptions. As with 
any form of "analysis-by-synthesis" (cf. Stevens & Halle, 1967), such an ac- 
count is plausible only if the number of candidates the module has to test can 
be kept within reasonable bounds. This requirenent is met, however, if, as we 
suppose, the candidate signal descriptions are computed by an analog of the 
production process — an internal, innately specified vocal-tract synthesizer, 
as it were (Liberman, Mattingly, & Turvey, 1972; Mattingly & Liberman, 
1969) — that incorporates complete information about the anatomical ana physio- 
logical characteristics of the vocal tract and also about the articulatory and 
acoustic consequences of linguistically significant gestures. Further con- 
straints become available as experience with the phonology of a particular 
language reduces the inventory o possible gestures and provides information 
about the phonotactic and temporal restrictions on their occurrence. The mod- 
ule has then merely to determine which (if any) of the small number of ges- 
tures that might h?, been initiated at a particular instant could, in combi- 
nation with gestures already in progress, account for the sigral. 

Thus, we would claim that the processes of speech percepti are, like 
other linguistic processes, inherently computational and quite ir.^. : rect. If 
perception seems nonetheless Immediate, it is not because the process is in 
fact straightforward, but because the module is so well-adapted to its complex 
task. 

The Motor Theory and Modularity 

In attributing speech perception to a "module," we have in rriind t*i 0 no- 
.ion of modularity proposed by Fodor (1983). A module, for Fcdor, is a piece 
of neural architecture that performs the special computations required cc pro- 
vide central cognitive processes vith representations of objects or events be- 
longing to a natural class tnat is ecologically significant fo^ the organi3m. 
This class, the "domain" of the module, is apt also to be "eccentric, " fo~ the 
domain would be otherwise merely a province of some more general domain, for 
which another module must be postulated anyway. Besides 3om<-ii.i-spec*f icity 
and speciali-ed neural architecture, a module has other characteristic proper- 
ties. Because the perceptual process it controls is not cognitive, there is 
little or no possibility of awareness of whatever computations a^e carried or 
within the module ("limited central access"). Because the rc~'jle is special- 
ized, it has a "shallow" output, consisting only of rigidly definable, do- 
main-relevant representations: accordingly, it proces3e3 only the aomain-rele- 
vant information in the input stimulus, .ts computations a**e thus much faster 
than tf'ose of the less specialized processes of central cognition, because of 
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the ecological importance of its domain for the organism, *:he operation of the 
module is not a matter of choice, but "mandatory"; for the bu^e reason, its 
computations are "informationally encapsulated," that is, protected from cog- 
nitive bias. 

Most psychologists would agree that auditory localization, to return to 
an example we have mentioned several times, is controlled by specialized pro- 
cesses of some noncognitive kind. They might also agree that its properties 
are those that Fodor assigns to modules. At all events, they would set audi- 
tory localization apart from such obviously cognitive activities as playing 
chess, proving theorems, and recognizing a particular chair as a token of the 
type called "chair." As for perception cf language, the consensus is that it 
qualifies as a cognitive process par excellence, modular only An that it is 
supported by the mechanisms of the auditory modality. But in this, we and 
Fodor would argue, the consensus is doubly mistaken: the perception of lan- 
guage is neither cognitive nor auditory. The events that constitute the do- 
main of linguistic perception, however they may be defined, must certainly be 
an ecologically significant natural class, and it has been recognized since 
Broca that linguistic perception is associated with specialized neural 
architecture. Evidently, linguistic perception is fast and mandatory; argu- 
ably, it is informationally encapsulated — that is, its phonetic, morphological 
and syntactic analyses are not biassed by knowledge of the world — arid its out- 
put is shallow — that is, it produces a linguistic description of the utter- 
ance, and only this. These and other considerations suggest tnat, like audi- 
tory localization, perception of language rests on a specialization of the 
kind that Fodur calls a module. 

The data that have led us in the past to claim that "speech is special" 
and to postulate a "speech mode" of perception can now be seen to be consist- 
ent with Fodor ? s claim3 about modularity, and especially about the modularity 
of language. (What we have been calling a phonetic module is then more proo- 
erly callcu a linguistic modulo.) Thus, as we have noted, speech perception 
uses all the information in the stimulus that is relevant to phonetic struc- 
tures: every potential cue proves to be an actual cue. This holds true even 
across TdOdalitieo : relevant optical information combines with relevant ace 3- 
tic inicrmat.cn to produce a coherent phonetic percept in which, as in \e 
example de Q oribed earlier, the bimodal nature jf the stimulation is not 
detectable. In contrast, irrelevant information In the stimulus is not used: 
I s * acoustic properties that might cause the transitions to be heard as chirps 
are ignored — or perhaps we should say that the auditory consequences of those 
properties are suppressed — when the transitions are in context and the 
linguistic module is engaged. The exclusion of the irrelevant extends, of 
course, to stimulus information about" voice quality, which helps to identify 
the speaker (perhaps by virtue of some other module) but has no phonetic im- 
portance, and even 1.0 that ex traphonetic information which might have been 
supposed to help the listener distinguish sounds that contain phonetic struc- 
tures from those that do not. As we have seen, even when f : .the tic speech 
lacks the acoustic properties that would make it sound natural, it will be 
treated as speech if it contains sufficiently coherent phonetic information. 
Moreover, it makes no difference that the listener knows, or can determine on 
auditory grounds, that the stimulus was not humanly produced; because linguis- 
tic perception is informationally encapsulated and mandatory, the listener 
will he<*r synthetic speech as speech. 
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A3 might be expected, the linguistic module is also very good at exclud- 
ing fron consideration the acoustic effects of unrelated objects and events in 
the environment; the resistance of speech perception to noise and distortion 
is well known. These other objects and events are still perceived, because 
they are dealt with by othe»* modules, but they do not, within surprisingly 
wide limits, interfere with speech perception (cf, Darwin, 1984). On the oth- 
er hand, the module is n r ^ necessarily prepared for non-ecological conditions, 
as the phenomenon of duplex perception illustrate?. Under the conditions of 
duplex perception the module makes a mistake it would never normally make: it 
treats the same acoustic information both as speech and as nons^eech. And, 
being an informationally encapsulated and mandatorily operating mechanism, it 
keeps on making the same mistake, whatever the knowled^- or preference of the 
listener. 

Our claim that the invariants of speech perception are phonetic gestures 
is rnuc easier to reconcile with a modular account of linguistic perception 
than with a cognitive account. On the latter view, the gestures would have to 
be inferred from an auditory represencation of the signal by some cognitive 
process, and this does not seem to be a task that would be particularly conge- 
nial to cognition. Parsing a sentence may seem to beai some distant 
resemblance to the Droving of theorems, but disentangling the mutually 
confounding auditory effects of overlapping articulations surely does not. It 
is thus quite reasonable for proponents of a cognitive account to reject the 
possibility that the invariants are motoric and to insist that they are to be 
found at or near tie auditorv surface, heuristic matching of auditory tokens 
to auditory prototypes being perfectly plausible ?3 a cognitive process. 

Such difficulties do not arise for our claim on the modular account. If 
the invariincs of speech are phonetic gestures, it merely makes the domain of 
linguistic perception more suitably arsentric; if the invariants were audi- 
tory, the case for a separate linguistic module would be the less compelling. 
Moreover, computing these invariants from the acoustic signal is a task for 
which there is no obvious parallel among cognitive processes. What is re- 
quired for this task is not a heuristic process that draws on some general 
cognitive ability or or knowledge of the world, but a special-purpose computa- 
tional device that relates gestural properties to the acoustic patterns. 

It remains, then, to say how the set of possible gestures is specified 
for the perceiver. Does it depend on tacit knowledge of a kind similar, per- 
haps, to that which is postulated by Chomsky to explain the universal con- 
straints on syntactic and pnono logical form? We think not, because knowledge 
of the acoustic-phonetic properties of the vocal tract, unlike other forms of 
tacit knowledge, seems to be totally inaccessible: no matter hov; hard they 
try > even post-perceptually, listeners cannot recover aspects of the proc- 
eas — for example, the acoustically different transitions — by which they might 
have arrived at the distal object. But, surely, this is just what one would 
expect * c the spec if ication of possible vocal-tract gestures is not tacit 
knowledge ~t all, but rather a direct consequence of the eccentric properties 
of the module itself. As already indicated, we have in earlier papers sug- 
gested that speech perception is accomplished by virtue of a model or* the vo- 
cal tract that embodies the relation between gestural properties and acoustic 
information. Now we would add that this model must be part of the very struc- 
tu. 9 of the language module. In that case, there would be, by Fodor's ac- 
count, aii analogy with all other linguistic universals. 
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Perception and Production: One Module or Two ? 



For want of a better word, we have spoken of the relation between speech 
perception and speech production as a "link," perhaps implying thereoy that 
these two processes, though tightly bonded, are nevertheless distinct. Much 
the same implication is carried, more generally, by Fodor's account of 
modularity, if only because his attention is almost wholly on perception. We 
take pains, therefore, to disown the implication of distinctness that our own 
remarks may have conveyed, and to put explicitly in its place the claim that, 
for language, perception and production are only different sides of the same 
coin. 

To make our intention clear, we should consider how language differs from 
those other modular arrangements in which, as with language, perception a/*d 
action both figure ir — ~? functional unity: simple reflexes, for example; or 
the system that automatically adjusts the posture of a diving gannet in 
accordance with cptical information that specifies the time of contact with 
the surface of the water (Lee & Reddish, 1981 ). The point about such systems 
is that the stimuli do not resemble the responses, however intimate the 
connection between them. Hence, the detection of she stimulus and the initia- 
tion of the response must be managed by separate components of the module. 
Indeed, it would make no great difference if these cases were viewed as an in- 
put module hardwired to an output module. 

Language is different: the neural representation of the utterance that 
determines the speaker's production is the distal object that uhe listener 
perceives; accordingly, speaking and listening are both regulated by the same 
structural constraints and the same grammar. If we were to assume two mod- 
ules, one for speaking and one for list'-Mng, we should then have to explain 
how the same structures evolved fcr both, and how the representation of the 
grammar acquired by the listening moaule became available to the speaking mod- 
ule. 

So, if it A s reasonable to assume that there is such a thing as a lan- 
guage module, then it is even more reasonable to assume that there is only 
one. And if, within that module, there are subcomponents that correspond to 
the several levels of linguistic performance, then each of these subcomponents 
must deal both with perception and production. T>us, if sentence planning is 
the function of a particular subcomponent, then sentence parsing is a function 
of the same subcomponent, and similarly, mutatis mutandis , for speech produc- 
tion and speech perception. And, finally, if all this is true, then the cor- 
responding input and output functions must themselves be as computationally 
similar as the inherent asymmetry between production and perception permits, 
ju3t as they are in man-made communication devices. 

These speculations do not, of course, reveal t-he rature of th2 computa- 
tions that the language module carries out, but they do suggest a power "ul 
constraint on our hypotheses about then, a constraint for \A\ic y \ there is no 
parallel in the case of other module systems. Thus, they caution that, among 
aV plausible accounts of language input, we should take seriously only those 
that are equally plausible as accounts of language outputs if a hypothesis 
about parsing cannot be readily restated a3 a hypothesis about sentence-plan- 
ning, for example, we 3hould suppose that something is wrong with it. 
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Whatever the weaknesses of the motor theory, it clearly does conform to 
this constraint., since, by its terms, speech production and speech oerception 
:re both inherently »7jtoric. On the one side of the module, the motor ges- 
tures are not the means to sounds designed to be congenial to the ear; rather, 
they are, in themselves, the essential phonetic units. On the other side, the 
sounds are not the true objects of perception, made available for linguistic 
purposes in some common auditory register; rather, they only supply the infor- 
mation for xmihSdiate perception of the gestures. 
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LINGUISTIC AND ACOUSTIC CORRELATES OF THE PERCEPTUAL STRUCTURE FOUND IN AN 
INDIVIDUAL DIFFERENCES SCALING STUDY OF VOWELS* 



Brad Rakerdt and Robert R. Verbruggett 



Abstract , Subjects judged the similarities among a set of American 
English vowels ( /i , i,e, ae, A ,a , o 9 o 9 u,u/) presented in isolation or in 
a /dVd/ consonantal frame. Individual differences scaling was em- 
ployed to analyze these similarities data for each of the conditions 
separately and for the two conditions combined. In all cases, 
perceptual dimensions corresponding to the advancement, height, and 
tenseness vowel features were recovered. Given the determinacy of 
individual differences 3caling, this finding is taken to provide 
strong evidence for the perceptual significance of those features. 
The perceptual dimensions are considered in relation to various 
acoustic parameters of the stimuli employed in this study. They are 
also considered in relation to perceptual dimensions that have been 
observed in other vowel scaling studies. 

Introduction 

Multidimensional scaling provides a means of modeling the psychological 
structure that is reflected in perceptual judgments. Scaling is particularly 
useful because judgments regarding a large number of stimuli can very often be 
modeled with a structure of relatively few dimensions, and because those di- 
mensions can then be interpreted in terms of properties familiar to an 
investigator (Carroll & Wish, 197**; Kruskal & Wish, 1978). In the domain of 
vowel perception, investigators have frequently found that the dimensions 
revealed by scaling can be related to various phonological features, a fact 
which is taken to imply that those features play a significant perceptual role 
(e.g., Fox, 1 983; Singh & Woods, 1 970; Shepard, 1972). 

The strength of that implication is, however, contingent on the type of 
scaling method that is used in a study. One class of scaling techniques 
yields solutions for which no single interpretation is possible. This owes to 
the fact that the models of psychological structure, which are spatial in 



^ Journal of the Acoustical Society of America, 1985, 77, 296-301 
tMichigan State University, Department of Psychology. 
ttBell Laboratories. 

Acknowledgment . This report is based on portions of a doctoral dissertation 
submitted by the first author to the University of Connecticut. The re- 
search was conducted at Haskins Laboratories and funded by grants awarded to 
that institution (NIH grants HD-0199^, RR-05596). Support was also provided 
by the University of Connecticut Research Foundation. We thank Alvin Liber- 
man, Michael Turvey, and Benjamin Sachs for their comments on the manu- 
script. 

[HASKINS LABORATORIES: Status Report on Speech Research SR-82/83 (1985)] 

95 

ERIC lCi 0 



Rakerd & Verbrugge: Perceptual Structure of Vowels 



character, lack a fixed orientation for their axes. One must therefore rotate 
these structures in search of an orientation that permits interpretation of 
the dimensions. There are an infinite number of possible rotations and the 
search must be constrained by an investigator's a priori notions regarding 
interpretation. Any conclusions drawn are correspondingly vulnerable to the 
challenge that some alternative interpretation would have been equally 
supported by the data had sane other rotation been carried out. 

A second class of scaling techniques cannot be challenged or these 
grounds because they specify a fixed orientation of dimension axes for their 
models of psychological structure. This class, the individual differences 
scaling techniques, achieve their added determinacy by modeling multiple sets 
of data simultaneously, each set reflecting the performance of a different 
subject. 1 An important underlying assumption of individual differences scal- 
ing is that when judging a common set of stimuli subjects can differ from one 
another in terms of the relative weights they attach to a set of shared 
perceptual dimensions, but not in terms of the identity of the dimensions 
themselves (Carroll & Chang, 1970). Except in unusual cases, there is one and 
only one orientation in which the shared dimensions can be weighted so as to 
account optimally for the variance in those subjects 1 data. That is the 
orientation recovered by individual difference scaling. It has been conjec- 
tured that with a well-defined perceptual task the dimensions revealed by 
individual differences scaling will correspond to fundamental sensory or judg- 
mental processes (Carroll & Chang, 1970). There are a number of instances in 
which that conjecture has been supported (Wish & Carroll, 197*0. 

In this paper, we report on an individua] differences 3caling study of 
vowel perception. It was conducted to address questions about the potential 
influences that consonantal context can exert on vowel perception, and else- 
where (Rakerd, 1984) we have considered the results in that regard. We did so 
by comparing the weights that subjects attached to a set of shared perceptual 
dimensions, depending on whether they heard vowels in or out of a consonantal 
frame. Our concern here is not with the weights, however, but with the shared 
dimensions themselves. Those dimensions can be usefully compared with 
linguistic features that have been found to be related to perceptual structure 
in other scaling studies (e.g., Fox, 1982, 1983; Terbeek, 1977), particularly 
those conducted with less determinate scaling techniques (Hanson, 1967; Pols, 
van der Kamp, & Plomp, 1969; Shepard, 1972; Singh & Woods. 1970). That is the 
first purpose of this paper. We examine subjects who judged vowels in 
consonantal context and subjects who judged isolated vowels, analyzing their 
data both separately and in combination. 

The second purpose of this paper is to report on correlations between the 
perceptual structure r *ealed by individual differences scaling and various 
acoustic parameters of our vowel stimuli. Though based on a limited number of 
stimulus tokens, those correlations are suggestive in that they speak to 
hypotheses that previous investigators have put forth regarding relationships 
between vowel features and the acoustic signal. 



A. Subjects 

Twenty-three subjects participated in this experiment. All of them were 
native speakers of English with normal hearing according to self -report. 
Twelve of the subjects were randomly assigned to make perceptual judgments re- 
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garding vowels in consonantal context. The remaining 11 subjects judged vow- 
els in isolation, 

B. Stimuli 

The stimuli for the experiment were ten different American English vowels 
(/i, x,e,ae,A,a, j>,o,u,u/) spoken by a male talker with a general American dia- 
lect. For the consonantal context condition, he produced those vowels in the 
trisyllabic frame /hodVde/, with stress placed on the second syllable (/dVd/). 
For the isolated condition, he proi uced them with no surrounding phonetic con- 
text (/#V#/). Two tokens of each vowel were produced in each condition. 
Recordings of these tokens were digitized at a sampling rate of 10 kHz and 
stored in separate computer files. 

C. Procedure 

Subjects were tested individually. Their task was to judge the similari- 
ty relations that they perceived among the ten different vowels. They were 
instructed to base those judgments on properties of the vowel sounds that 
seemed to them to distinguish words in English (Carlson & GranstrBm. 1979; 
Klatt, 19^9). The similarity judgments were made with a triadic comparisons 
method that has been employed in previous vowel perception studies (Pols et 
al., 1969; Terbeek, 1977; Terbeek & Harshman t 1971). According to this proce- 
dure, three of the ten vowels were rated on each experimental trial. Subjects 
listened to these vowels in any order that they chose and as often as they 
chose. 2 They then reported which two of the three vowels sounded most alike 
to them and which two least alike. Over trials, all possible triads were 
judged. The judgments were then summed across trials, with a score of ♦ 1 as- 
signed to all most-alike pairs and -1 to all least-alike pair3. This yielded 
a single (symmetrical) matrix of similarity judgments for each subject. 

D. Data analysis 

The matrices for all 23 subjects who participated in the experiment were 
submitted to nonmetric individual differences scaling, using the ALSCAL proce- 
dure developed by Takane, Young, and Leuuw (1977). It was determined that a 
three-dimensicnal scaling solution was most appropriate for the data. That 
decision was based on several factors. First, modeling in three dimensions 
accounted for a substantially greater percentage of variance (an average of 
70$ for each subject) than modeling in two dimensions (60Jt), and only margi- 
nally less than motiving in four dimensions (72%). Second, the three dimen- 
sions were readily interpretable from a linguistic standpoint. And finally, 
those dimensions were quite stable, in that they were also found in separate 
analyses of the two experimental conditions (see Sec. II) and, with certain 
modeling constraints, in the scaling solution for a memory study (Rakerd, 
198*4) that complemented this perceptual study. 

For additional details concerning the data analysis, as well as other as- 
pects of the experimental method, see Rakerd (198*4). 

II. The Scaling Solutions 

We first consider the perceptual dimensions that emerged from an analysis 
of data matrices for all 23 subjects. Although these dimensions have been de- 
scribed elsewhere by Rakerd (198*0, they are examined here in greater detail, 



97 



Rakerd & Verbrugge: Perceptual Structure of Vowels 



with particular attention paid to comparisons with phonological features of 
linguistic description for vowels, and with dimensions that have been reported 
in previous scaling studies of vowels. In the second part of Sec. II, we de- 
scribe the perceptual dimensions that resulted from separate analyses of the 
consonantal- context and isolated conditions of the study. 

A. The two conditions combined 

1 . Dimensions 1_ and 2 

Dimension 2 (D2) of the scaling solution Tor all subjects is plotted 
against dimension 1 '.D1) in the top half of Fig. 1. The distribution of vow- 
els in this plane is clearly related to the traditional "vowel quadrilateral" 
(Ladefoged, 1975; Lindau, 1 978) 5 with D1 corresponding to the advancement 
feature of vowels,* and D2 to the height feature. There is considerable prec- 
edent for observing correlates of these two phonological features in vowel 
scaling studies (Fox, 1982, 1983; Hanson, 1967; Pols et al., 1969; Shepard, 
1972; Singh & Woods, 1970). Those findings, together with the results of the 
present study, strongly support the view that the advancement and height fea- 
tures play a significant role in the perception of vowels in English. The 
findings are al30 consistent with the larger view that advancement and height 
enjoy a special status in all languages (Lindau, 1978). 

2. Dimension ^ 

The third dimension of the rombined group space (D3) is plotted against 
D1 in the bottom half of Fig. 1, The vowels are ordered along it .such that 
/i,ae,e,A,u/ have negative values and /i,a,o,o,u/ have positive values. The 
former are lax vowels, the latter tense. Hence, D3 can be interpreted as cor- 
responding to the tenseness feature. Unlike advancement and height, a tense- 
ness dimension has very rarely been recovered in vowel scaling studies. To 
our knowledge, only Anglin (1971; cited in Singh, 1976), who scaled similarity 
judgments for vowels in /hVd/ context, has recovered a dimension similar to 
D3. In that analysis, the scaling method did not yield a single, interpret- 
able orientation for the model of psychological structure. The present, more 
determinate scaling result might therefore be taken to provide the strongest 
available evidence for perceptual significance of the tenseness feature. 

B. Separate analyses of the conditions 

When perceptual judgments for the isolated and consonantal-context condi- 
tions were scaled separately, in three dimensions, the amount of variance that 
could be accounted for in the data (VAF) improved marginally over its corre- 
sponding value in the combined analysis. (VAF for analysis of the isolated 
condition was 7k%, that for the consonantal- context condition was 12%. This 
compares with 70? in the combined a ilysis.) This marginal improvement result- 
ed from some local shifts in the positioning of vowels in the separate scaling 
solutions. As will be seen, the global structure nevertheless remained quite 
similar to that of the combined analysis. 

1 . The isolated condition 

The perceptual dimensions for the isolated condition are shown in Fig. 2. 
Only D2 is notably different from the corresponding dimensions of the combined 
analysis (see Fig. 1). Along this dimension, the vowels /e/ and /o/ have as- 
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Figure 1. Perceptual structure for subjects from the two experimental condi- 
tions combined. This figure is reproduced from Rakerd (1984) by 
permission of The Psychonomic Society. 
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sumed values that are somewhat more positive than they had been previously. 
The movement of /e/ principally reflects the fact that /e/ and /i/ were judged 
to be highly similar, indeed, the most similar of all vowel pairs in the 
isolated condition. Likewise, the movement of /o/ is largely dictated by a 
single vowel pairing; /o/ and /u/ were judged to be extremely similar in iso- 
lation, perhaps reflecting the fact that they were the only two diphthongized 
vowels in the isolated set. 

Despite repositioning of these two vowels, the dimensions of the isolated 
solution maintain a strong correspondence with the advancement, height, and 
tenseness features, respectively. 

This analysis can be usefully compared with one by Singh and Woods (1970) 
in which it was found that tenseness had no perceptual significance for 
listeners who rated the relative similarity of isolated vowels. Those 
investigators attributed their finding to listeners' knowledge that isolated 
lax vowels are phonologically impermissible in English. The outcome of the 
present study indicates that there may have been other factors at work as 
well. For several of our isolated-vowels subjects, the tenseness dimension 
(D3) did, indeed, have little or no perceptual salience, but for others it was 
the most heavily weighted dimension (Rakerd, 1984). Perhaps talkers produced 
their isolated vowels differently in the Singn and Woods study, or perhaps, by 
averaging their data over subjects prior to scaling, Singh and Woods lost any 
statistical evidence of the significance of tenseness, wnatever the case, it 
is apparent that under certain conditions listeners can attend to the tense- 
ness dimension of isolated vowels, despite the phonological restriction. 

2. The consonanta 1-contex t condition 

Perceptual dimensions for the separate analysis of the consonantal-con- 
text condition are shown in Fig. D1 and D2 are quite simUar to their 
counterparts in the combined analysis (Fig. 1), again reflecting sensitivity 
to advancement and vowel height, respectively. Along the third dimension, 
there is some divergence from the combined solution, with the vowel /i/ moving 
in a more positive direction. This movement resulted from the fact that the 
/i-i/ vowel pair was judged highly similar in consonantal context. Neverthe- 
less, D3 retains a correspondence with tenseness. 

3. Stability of the scaling solutions 

The agreement among these separate scaling solutions and the combined 
solution is evidence of the stability of this modeling outcome. Perceptual 
dimensions closely related to advancement, height, and tenseness were recov- 
ered in all cases, which makes it extremely unlikely that their emergence in 
any individual case was a coincidental consequence of the scaling analysis it- 
self. 

III. Acoustic Correlates of the Perceptual Dimensions 

We computed correlations to assess the strength of relationships between 
the perceptual dimensions revealed by our combined scaling analysis and vari- 
ous acoustic parameters of the vowel stimuli. The acoustic measurements were 
made from wideband spectrograms. In the case of isolated vowels, center fre- 
quencies of the first three formants (F1, F2, and F3) were measured at a po'nt 
approximately halfway through each token. Duration of voicing was also mea- 
sured for the isolated vowels. 
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Figure 3. Perceptual structure for subjects from the consonantal-context 
^condition. 
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The acoustic structure of the /dVcl/ syllables comprised an onglide (or 
period of syllable-initial formant transition) and an offglide (syllable-final 
transition), with little or no region in which the formants could be described 
as maintaining a steady-state frequency. Therefore, we adopted the convention 
of measuring F1 , F2, and F3 at the end of the onglide (a point also represent- 
ing the beginning of the offglide). Duration was measured from the first evi- 
dence of voicing following initial-zd/ release to the last prior to finaWd/ 
closure. Last, we computed the proportion of total syllable duration that was 
taken up by the offglide. 

Recall that there were two tokens of each vowel in ea 4 context. The me- 
an parameter values for those two tokens are listed in Table 1. Isolat- 
ed-vowel parameters appear in the top half of the table, /dVd/ parameters in 
the bottom half. An examination of Table 1 shows tnat the stimuli were 
acoustically "normal" in the sense that their parameters were roughly compar- 
able to those that other investigators have reported * jr much larger data 
bases (Klatt, 1975; Peterson & Barney, 1952; Peterson » Lehiste, i960; Umeda, 
1975). The data also provide evidence of vowel reduction (Joos, 1948; Lind- 
blom, 1963) in consonantal context. Formant frequency differences among the 
vowels were smaller in the /dVd/ condition than in isolation. 

Rank-order correlations (Spearman's rho) were computed between the acous- 
tic data reported in Table 1 and coordinates for the perceptual dimensions of 
the combined analysis. The results are reported in Table 2. First consider 
correlations for the isolated vowels, which appear in the top half of the 
table. The following correlations (and no others) proved significant: D1 
(which we have interpreted as advancement) with F2 and F3, D2 (height) with 
F1> and D3 (tenseness) with duration. The findings regarding D1 and D2 are 
anticipated by a number of previous scaling studies (Fox, 1982, 1983; Pols efc 
al., 1969; Shepard, 1972). The finding for D3 is consistent with the "eport 
that vowel tenseness is related to duration (Peterson & Lehiste, 1960 \ 

The bottom half of Table 2 shows correlations for vowel in /dVd/ context. 
Note that relative to the isolated vowels there is a substantial reduction in 
the strength of the correlation between D1 and F2 (0.72, down from 0.95) and 
between D1 and F3 (0.66, down from 0.8H). These statistical changes reflect 
the fact that the high-back vowels /o,u,u/ were radically reduced in /dVd/ 
context, as might be expected given the alveolar place of articulation of the 
consonants. Though not unusual, this circumstance merits conment in that it 
calls into question strong statements to the effect that the relationship be- 
tween tne advancement feature and the formant structure of vowels is a simple 
one (see, e.g., Lindau, 1978; Singh, 1976). Our finding is one of the sort 
that shows that this relationship is affected by the phonetic context in which 
a vowel occurs. 

It can also be seen in Table 2 that, in the consonantal-context condi- 
tion, duration was not significantly correlated with D3 (tenseness), as it had 
been with isolated vowels. It appears that judgments regarding D3 could not 
have bee. made on the basis of vowel duration in this condition. Apparently, 
subjects 1 perceptions of tenseness were cued by some other acoustic property 
in the /dVd/ context. A likely candidate is offglide proportion, which was 
significantly correlated with D3. Indeed, it is possible to account perfectly 
for at least the macrostructure of D3 ordering on the b^sis of offglide 
proportion alone. Table 1 shows that the tense vowels, which all had positive 
D3 coordinates, also had offglide proportions of 50% or less, and that the lax 
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vowels, which all had negative D3 coordinates, also had offglide proportions 
of 60$ or more. This finding is reminiscent of an observation made by Lehiste 
and Peterson ( 1 96 1 ) , although our measurement procedures were somewhat differ- 
ent from theirs. In both instances, tense vowels were found to be marked by a 
relatively brief period of offglide into a following consonant, and lax vowels 
by an offglide that was more substantial in duration, 

A number of investigators have reported that vowels in consonantal con- 
text are identified with greater accuracy than isolated vowels (Gottfried & 
Strange, 1980; Rakerd et al., 1984; Strange, Edman, & Jenkins, 1976; Strange, 
Ve.brugge, Shankweiler, & Edman, 1979). It has been suggested that one reason 
for this perceptual advantage may be that the dynamic acoustic structure of 
syllables is a unique source of vowel information (S:range et al., 1976; 
Strange, Jenkins, & Johnson, 1983). Our observation of an association between 
offglide proportion and the tenseness feature is certainly consistent with 
this view. 

IV. Summary and Conclusions 

A stable, interpretable individual differences scaling solution was found 
for subjects' similarity judgments regarding a set of American English vowels. 
This solution had three dimensions vhich corresponded, respectively, to the 
linguistic features of advancement, height, and tenseness. Those correspond- 
ences provide particularly strong evidence for the perceptual significance of 
the features due to the determinacy of individual differences scaling. 

While the results regarding the advancement and height features confirm 
expectations based on a number of previous scaling studies, recovery of a 
tenseness dimenL* jn is more surprising. One reason for its recovery in the 
present instance may have to do with the individual differences scaling method 
itself. Across subjects, there was wide variability in the perceptual sali- 
ence of tenseness, particularly among those who rated isolated vowels (Rakerd, 
1984); With individual differences scaling, this variability was manifest in 
the different weighting that eac* subject attached to D3. However, had the 
data been averages over subjects prio^ analysis, as required by many scal- 
ing methods, it :s likely that the • r \ity would have made it impossible 
to recover a tenseness dimension. also be relevant that we instructed 

subjects to attend to those aspects d trie vowel sounds that seemed to them to 
distinguish words in English. Previous investigators (Carlson & Granstrfim, 
1979; Klatt, 1979) iave reported that an instruction of this type can 
strengthen the lingu stio character of subjects 1 perceptual judgments. 

There were two r.otewoi uhy findings regarding correlations between the 
scaling results and acoustic parameters of the vowel stimuli. The fiist was 
that vowel duration kab not significantly correlated with the tenseness dimen- 
sion in /dVd/ context. Hence, the emergence of this dimension, particularly 
in the separate analysis of the consonantal- context condition, cannot be 
attributed to subjects having attended to durational differences among the 
vowels. 

The second observation was that in /dVd/ context tenseness was signif- 
icantly correlated with offglide proportion. Tense vowels had an internal 
syllable structure in which the offglide constituted 50f or less of the vocal- 
ic region. For lax vowels, the offglide made up 60J or more of the vocalic 
region. This finding is similar to one reported by Lehiste and Peterson 
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Although this is most commonly the case, and was the case in the present 
study, each of the several data matrices submitted to an individual differ- 
ences scaling analysis need not represent the performance of a single subject. 
As alternatives, there could, for example, be one matrix for each of the sev- 
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lus presentation. They directed presentation of the triad of stimuli for each 
trial by pressing three different buttons on a computer terminal. 

'The term advancement is used to be consistent with the earlier work of 
Singh and Woods (1970), and with Rokerd (198*1). An alternative, and perhaps 
more common term for this feature would be backness (Ladefogeo, 1975). 
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PERCEPTUAL COHERENCE OF SPEECH: STABILITY OF SILENCE-CUED STOP CONSONANTS* 



Bruno H. Repp 



Abstract , A series of experiments was conducted to examine the 
perceptual stability of stop consonants cued by silence alone, as 
when [s]+sil3nce+[laet] is perceived as "splat." Following a repli- 
cation of this perceptual integration phenomenon (Exp. 1 ) f attempts 
were made to block it by instructing subjects to disregard the ini- 
tial [s] and to focus instead on the onset of the following signal, 
which was varied from [plaet] to [laet]. However, these instruc- 
tions had little effect at short silence durations (Exp. 2), and 
they reduced stop percepts for only two subjects at longer silence 
durations (Exp. 3). That is, subjects were generally unable to 
dissociate the [s] noise from the following signal voluntarily and 
thus to perceive the silent interval as silence rather than as a 
carrier of phonetic information. A low-uncertainty paradigm 
facilitated the task somewhat (Exp. 4). However, when the [s] fri- 
cation was replaced with broadband noise (Exp. 5), listeners had no 
trouble at all in the selective-attention task, except at very short 
silence durations (< 40 ms). This last finding suggests that, ex- 
cer' for the shortest durations, the effect of silence on phonetic 
perception dees not arise at the level of psychoacoustic stimulus 
interactions. Rather, the results support the hypothesis that 
perceptual integration of speech components, including silence, is a 
largely obligatory perceptual function driven by the listener's tac- 
it knowledge of phonetic regularities. 

When listening to speech we perceive a coherent stream of sound, not a 
sequence of clicks, whistles, buzzes, and hisses. In view of the many abrupt 
changes of excitation and spectral structure that take place in normal speech, 
this apparent auditory coherence might seem like a remarkable perceptual 
accomplishment. However, it may well reflect the fact that the ordinary 
listener's attention is not focused on the detailed physical properties of the 
speech signal but on the underlying, linguistically relevant information. 
That is, auditory coherence of speech may be Inferred from the perceived actu- 
al continuity of certain underlying articulatory events. If so, then there 
may be a more analytic level of perception that is sensitive to physical 
discontinuities in th* speech signal. 

Speech does possess certain acoustic features that promote auditory 
coherence of otherwise disparate signal portions. For example, formant 
transitions have been considered to provide a kind of "perceptual glue" thac 
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holds successive sounds together and helps preserve their temporal order (Cole 
& Scott, 1973; Dorman, Cutting, & Raphael, 1975). This can hardly be the 
whole story, however. If perceptual coherence and integration were determined 
entirely by properties of the acoustic signal and their auditory transforms, 
it would be impossible for ^tener to decompose the speech signal into ics 
components deliberately. Nevertheless, this is possible, at least to a cer- 
tain extent, by focusing one's attention on the level of auditory qualities 
(see e.g., Pilch, 1979). For example, it is not difficult even for a naive 
listener to attend selectively to the series of high-pitched hisses that rep- 
resent repeated occurrences of [s] in the speech stream. Under special condi- 
tions, the perceptual isolation of such auditory components may be facilitat- 
ed: Cole and Scott (1973) rapidly repeated the syllable [sa] over and over, 
and listeners soon reported hearing two separate streams of sounds, one 
consisting of hisses (the fricative noises) and the other of syllables sound- 
ing like [ta] (the vowel with its initial formant transitions). In this 
unnatural situation, the segregation may take place at a relatively early 
perceptual stage; simiJar "streaming" can be induced in repetitive multicompo- 
nent nonspeech signals (Bregman, 1978). 

Under more natural circumstances, the perceptual integration of certain 
disparate acoustic components of speech may still not be completely obligato- 
ry, though it reflects the normal mode of speech perception. If percepcual 
integration of the. ~ speech components could be disengaged by manipulating 
listeners* interpretation of the stimulus, this wculd suggest that the normal- 
ly perceived coherence of the speech signal is contingent on a nonobligatory , 
central function characteristic of phonetic perception. If the integrative 
function proved difficult to disengage, and if low-level psychoacoustio 
interactions can be ruled out as the cause of the Integration, then the 
conclusion would be that perceptual integration of speech components is not 
only a characteristic but also an obligatory function of phonetic perception. 1 

Evidence in favor of the hypothesis that certain types of perceptual 
integration are speech- spec if ic has been obtained in several recent studies 
concerned with "trading relations" among acoustic cues. Thus, Best, Morron- 
giello, and Robson ('; J') have shown that, in noise-plus-sinewave analogs of 
utterances of the type "say" versus "stay," the silent closure interval 
following the noise and the onset frequency of the tone mimicking the first 
formant (F1) both contribute to a stop consonant percept as long as the stimu- 
li are perceived as speech; however, when the stimuli are perceived as non- 
speech, the two acoustic cues are no longer integrated and are perceived as 
unrelated auditory properties. In another study, Repp (1981) trained subjects 
to discriminate the pitch of fricative noises preceding different vowels con- 
taining one of two sets of formant transitions. There was no effect of the 
vocalic context on the subjects 1 pitch judgments, even though the phonetic 
identification of the fricative consonant was influenced by both vowel quality 
and formant transitions. Furthermore, Dorman, Raphael, and Liberman (1979) 
and Rakerd, Dechovitz, and Verbrugge (1982) experimented with utterances whose 
precise phonetic interpretation depended on the duration of a silent closure 
interval occurring at a syllable boundary. When either fundamental frequency 
(Dorman et al., 1979) or the intonation contour (Rakerd et al., 1982) was 
changed abruptly across syllables, the silence lost lt3 percepcual effect. 
Although spectral discontinuity could have played a role here, circumstantial 
evidence suggests that subjects* perception of one v ;rsus two speakers or 
utterances was responsible for the effect. Thus, all the studies cited pro- 
vide evidence for a central level of perceptual integration that can be disen- 
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gaged in at least three ways: by leaving the speech mode altogether, by 
selectively attending to specific auditory properties of the speech signal, or 
by perceiving a change of source or of linguistic structure. 

In the present research, the focus is on the perceptual integration 
occurring in CsplJ clusters. Acoustic cues to the perception of a labial stop 
consonant in this context include, first and foremost, an interval of silence 
following the [s] noise (oastian, Eirnas, & Liberman, 1961; Fitch, Halwes, 
Erickson & Liberman, 1980), but also spectral changes in the fricative noise 
and the amplitude contour at noi3e offset (Summer f ield , Bailey, Seton, & Dor- 
man, 1981), *:he duration of the [s] noise (Repp, 1984c), the presence and am- 
plitude of a release burst following the silent closure (Repp, 1984b, I984d), 
formant onset frequencies and transitions in the following voiced portion 
(Fitch et al., 1980; see also Bailey & Summerfield, 1980), and the duration 
and possibly the amplitude envelope of the voiced portion (Repp, 1984c). Of 
special interest here is the finding (Dorman et al. , 1979) that a percept of 
"split" can be elicited by simply concatenating an [s] noise and a [lit] syll- 
able, with an appropriate interval of silence (about 100-300 ms) in between; 
in other words, in this context silence alone can be a sufficient cue for the 
perception of a "p," as long as there are no contradictory cues from the 
surrounding signal portions. Since neither of the energy-carrying signal por- 
tions in isolation contains sufficient cues to a "p," and the silence by it- 
self naturally does not either, the stop consonant percept in this case is a 
pure product of perceptual integration over time and thus constitutes an ideal 
test case for our purposes. 

The question addressed in the present study is: How robust is this 
perceptual integration effect— that is, can a listener deliberately avoid the 
stop consonant percept and hear the stimulus components the way they sound in 
isolation, for example, as "s" followed by "lit"? This question is not unrea- 
sonable because a stop cued by silence alone does not sound perfectly natural 
and might be expected to be perceptually unstable, almost an illusion. The 
answer to the question also bears on two contrasting hypotheses that have been 
put forward to account for perceptual integration and cue trading relations in 
phonetic perception (see Pastore, 1981; Repp, 1982): If these phenomena are a 
function of purely psychoacoustic stV.ulus properties that emerge in peripher- 
al auditory processing, then it should be extremely difficult to disengage 
them through acts of selective attention or linguistic restructuring. If they 
are a function of speech-specific mechanisms, however, it might be possible to 
change them by manipulating listeners 1 interpretation of the stimulus, without 
necessarily leaving the speech mode. A positive result would simultaneously 
refute the psychoacoustic hypothesis and support the existence of a special 
integrative level of perception, whereas a negative result, to be interpret- 
able, would require an additional demonstration that psychoacoustic interac- 
tions are not the cause of the subjects' difficulty. 

Accordingly, this paper reports several attempts to "get rid of the stop" 
in subjects 1 perception of [s]+silence+[llt] « "split" type utterances by 
directing their attention to the stimulus portion following the silence. A 
replication of the basic phenomenon of silence-cued stop consonant perception 
(Exp. 1) is followed by experiments that investigate the effect of selective 
attention instructions for stimuli with different absolute silence durations 
(Cxps. 2 and 3)» and with some subsequent changes in test format to reduce 
stimulus uncertainty (Exp. 4). Since, as will be seen, the stop consonant 
percepts proved unexpectedly resistant to these manipulations, the last 
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experiment (Exp. 5) aimed at ruling out psychoacoustic interactions as the 
cause of the silence-cued stop percept. On the assumption that this last 
study succeeded in its aim, the conclusion will be that perceptual integration 
of speech components, in this instance at least, is a relatively compulsory 
function of phonetic perception. 

Experiment 1 

Experiment 1 was an attempt to replicate an earlier striking demonstra- 
tion of the perceptual integration phenomenon of interest, owing to Dorman et 
al. (1979, Exp. 3). These authors concatenated natural [s] and [lit] utter- 
ances th »t had been recorded in isolation and that were considered to contain 
no traces of any [p]. When the silent interval between the stimulus compo- 
nents was shorter than 60 ms, listeners uniformly reported "slit." At silent 
intervals between 80 and 450 ms, however, listeners reported predominantly 
"split," with a maximum of over 90 percent around 300 ms of silence. This op- 
timal closure interval was much longer than a typical [p] closure in this con- 
text (about 90 ms; see Morse, Eilers, & Gavin, 1982); moreover, it took as 
much as 650 ms of silence before subjects uniformly reported hearing "s-lit" 
(i.e., "3" followed by "lit"), rather than "split." Since the "p" percepts in 
such stimuli are sometimes not very convincing, a replication of the Dorman et 
al. study seemed advisable, to verify that their subjects 1 "p" parcepts were 
not just phantoms. 

The long optimal closure duration (300 ms) in the Dorman et al. experi- 
ment may have been due to perceptual compensation for the absence of other 
cues to stop manner. However, there is also the possibility that the use of a 
wide range of closure durations (0-650 ms) f combined with a higher relative 
frequency of short intervals, promoted a bias toward reporting "split" at 
atypically long closure durations. Therefore, two different stimulus ranges 
were employed here to assess the effect of this variable on the "3l"-"spl" and 
w spl"-"s-l" boundaries. The stimuli in this part of the experiment (1a) began 
with a fricative noise that contained some positive stop manner cues and that 
was also used in Experiments 2-4. To approximate the conditions of the Dorman 
et al. (1979) study even more closely, the test employing a wide range of clo- 
sure durations was later repeated (1b) using a fricative noise without 
positive stop manner cues. 

Method 

Subjects . Nineteen paid volunteers served as subjects, 10 in Experiment 
1a and 9 in 1b. They were Yale undergraduates and native speakers of American 
English. 

Stimuli . A female speaker recorded several repetitions of the utterance 
Tsplaet] ("splat"). One good token was low-pass filtered (-3 dB at 9.6 kHz, 
-55 dB at 10 kHz) and digitized at a 20 kHz sampling rate. Because this 
speaker's fricative noises contained significant energy at frequencies above 
10 kHz, which caused some digitization artifacts, digitization and subsequent 
recording of audio tapes were done at half speed. The [s] noise was 125 ms 
long. The silent closure interval and the initial 11.5 ms of the following 
stimulus portion, corresponding to the labial release burst (and perhaps 
including a weak first glottal pulse), were removed. The remaining portion in 
isolation elicited over 90 percent "lat" responses (see Exps. 2 and 3, pre- 
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test). Thus it did not seem to contain any sufficient cues to a preceding la- 
bial stop. The fricative noise from [splaet], however, may have contained 
such cues. Therefore, Experiment 1b used a fricative noise derived from an 
utterance of [slaet] produced by the same speaker, 190 ms in duration. 2 

Two identification tests were assembled for Experiment 1a. In one, the 
Cs] noise was followed by the [lael] portion at each of 14 different closure 
durations: 0, 20, 40, 60, 80, 100, 150, 200, 250, 300, 400, 500, 600, and 700 
ms. This test was also duplicated in Experiment 1b with the different [s] 
noise. In the other test used in Experiment 1a, only the 9 closure durations 
up to 250 ms were included. Each test contained 10 succ sive randomizations 
of the stimuli, with inter stimulus intervals (ISIs) of 2.5 s and interblock 
intervals of 6 s. The stimulus sequences were recorded at half speed on audio 
tape using high-quality equipment, with closure durations and ISIs at twice 
their nominal values; thus they had the i* tended values at playback speed. 

Procedure . The subjects listened i .ividually or 'n small groups over 
TDH-39 earphones in a quiet room. They xQentif ied eacn stimulus in writing as 
beginning with "si," "spl," or "s-l" (i.e., "s" followed by silence and 
"lat"). 

Results and Discussion 

The average percentage of stop (i.e., "spl") responses is plotted in Fig- 
ure 1 as a function of closure duration (on a logarithmic scale). Filled and 
open circles represent the data from the two conditions of Experiment 1a. It 
is evident that stimuli with short closure intervals were perceived as begin- 
ning with "si." The "si "-"spl" boundary fell at about 70 ms of closure dura- 
tion. "Spl" responses were obtained for closure intervals rangit.g from 60-300 
ms, with the peak occurring at 100-150 ms of silence. At Jonger closure dura- 
tions, an increasing number of "s-l" responses was obtained. 3 Truncation of 
the stimulus range did not affect the "si "-"spl" boundary but shortened the 
"spl"-"s-l" boundary by about 80 ms. At closure intervals of 200 and 250 ms 
combined, there were significantly fewer "spl" responses in the narrow- range 
than in the wide-range condition (one-way repeated-measures ANOVA: F(1,9) - 
26.25, £« .0006). The "spl"-"s-l" d^tinction is not very categorical and 
was expecced to be affected by stimulus range. The fixed "sl"-"spl" boundary, 
on the other hand, suggests that the silence-cued "p" percepts at closure 
durations below 150 ms were relatively stable and insensitive to range ef- 
fects. 

The results from Experiment 1b are represented by the triangles in Figure 
1. They confirm that the fricative noise in Experiment 1a contained some 
positive stop manner cues. The "si "-"spl" boundary was at a longer silent in- 
terval here (clor.e to 100 ms), the maximum of "spl" responses was le3s pro- 
nounced and occurred at longer silences (150-250 ms), and the subjects experi- 
enced more uncertainty at the longest intervals, giving more M spl" responses 
here thar in Experiment 1a. All these differences are at least in part due to 
the longer duration of the fricative noise used in Experiment 1b (cf. Repp, 
1984c), but spectral differences at noise offset may also have played a role. 

The general pattern of these results is consistent with the findings of 
Dorman et al. (1979). That is, even without any strong step manner cues in 
the surrounding signal portions, "p" percepts are obtained in a certain range 
of closure durations. The 70 ms boundary separating "si" from "spl" responses 
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Figure 1. Percent stop (i.e., "spl") responses as a function of closure dura- 
tion in Experiments 1a (filled and open circles) and 1b (trian- 
gles). The open circles represent the results from the condition 
with a reduced range of closure durations. 



in Experiment 1a is very close to that obtained by Dorman et al. The results 
of Experiment 1b resemble the Dorman et al. findings in terms of the optimal 
closure duration for hearing "p"; they suggest that listeners need exception- 
ally long closure intervals for stop perception when closure duration is the 
sole stop manner cue, perhaps to compensate for the absence cf other cues. 
The optimal closure duration in Experiment 1 a, however, is shorter than in 
than in the Dorman et al. study, and so is the longest closure at which "p" 
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percepts were still obtained. These results are somewhat closer to reflecting 
the typical closure durations observed in natural speech. 

Experiment 2 

Even though Experiment 1 demonstrated the perceptual reality of 
silence-cued stop consonants, it did not tell us how obligatory these percepts 
are. The fact that the percentage of "spl" responses did not reach 100 per- 
cent at any closure duration suggests a certain amount of ambiguity. Subjects 
may also have felt compelled to apply the "spl" response category supplied by 
the experimenter. How easy would it be to convince listeners that what they 
are hearing is real ly "s" followed by "lat," and not "splat"? The technique 
adopted to investigate this issue in the following experiments was to con- 
struct a continuum from [plaetj to [laet], to prefix it with an [s] noise plus 
a varying silent interval, and to instruct listeners either to identify the 
whole stimulus ("integrative" condition) or to ignore the [s] and identify on- 
ly the part following the silence ("analytic" or selective-attention condi- 
tion). Since tne test included clear [splaet] (i.e., [s]+silence+[plaet] ) 
stimuli, there was no pressure to give any stop responses to 
[s]+silence+[laet] stimuli. On the contrary, contrast among stimuli in the 
test should reduce any such tendencies. The analytic instructions were rein- 
forced by the use of the response "b" (actually, "bl") for the syllable-ini- 
tial labial stop, if one was perceived, as contrasted with "p" (actually, 
"spl") in the integrative condition. 1 * Note that the analytic instructions re- 
quired a perceptual reinterpretation within the linguistic domain, without 
leaving the speech mode valthougii thinking of the [s] as some extraneous noise 
might help). If the instructions wore effective, fewer stop responses should 
be obtained in the analytic than in the integrative condition at closure dura- 
tions beyond 100 ms, particularly for those stimuli whose final portion was 
perceived as beginning with "1" in isolation. 

The "Jtop generation effect" diooussed so far — the introduction of a stop 
percept by appropriate amounts of silence in the absence of any other suffi- 
cient cues — may be contrasted with a "stop suppression effect" due to an ab- 
sence of a sufficient interval of silence in the presence of other sufficient 
cues. Thus, earlier observations (e.g., Fitch et al., 1980; Mann & Repp, 
1980) lead to the expectation that stimuli perceived as beginning with "bl" in 
isolation will lead to "si" responses when preceded by an [s] noise with lit- 
tle or no silence in between. If this stop suppression effect reflected the 
same higher-level, integrative mechanisms as the stop generation effect, and 
if analytic listening instructions were effective, then more stop responses 
should be obtained in the analytic than in the integrative condition at short 
closure durations, particularly for those stimuli whose final portion was per- 
ceived as beginning with "bl" in isolation- 

Thus, the strongest prediction for Experiment 2 is that silent closure 
duration will have a marked effect on stop perception in the integrative 
listening condition but no effect at all in the analytic condition: Stimuli 
should be labeled as if there were no preceding [s]. However, apart from the 
fact that it is more realistic to expect only a more or less pronounced tend- 
ency in the predicted direction, the stop generation and suppression effects 
may well be differentially sensitive to attentional strategies. The stop 
suppression effect, which results from signal components occurring in close 
succession, is rnjch more likely to involve auditory interactions (such as for- 
ward masking) than the stop generation effect, which results from components 
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that are more widely separated in time. If this notion is correct, then the 
prediction should be that selective attention instructions, if effective, will 
lead to a reduction of stop percepts at longer silences but not to an increase 
of stop percepts at short silences. 

M ethod 

Subjects . The same 10 subjects as in Experiment 1a participated. 

Stimuli . A continuum from [plaet] to [laet] was constructed from the 
source utterance used in Experiment 1a, [splaet]. The original 11.5 ms labial 
release burst was truncated by 0, 2, 4, 7.5, or 11.5 ms, yielding five stimuli 
intended to range perceptually from "blat" to "lat" in the absence of a 
preceding [s]. 5 The outpoints were placed at zero-crossings in the digitized 
waveform. A brief pretest was assembled in which these five stimuli (without 
any preceding [s]) occurred 10 times in random sequence, with ISIs of 2.5 s. 

Two additional identification tests were assembled. In one, designed for 
integrative listening, each stimulus from the [plaet ]-[ laet] continuum was 
preceded by [s] at silent intervals of 0, 40, 80, 120, and 1 60 ms, for a total 
of 25 stimuli that were recorded 10 times in random sequence with ISIs of 2.5 
s. The other test, designed for analytic listening, contained 10 random se- 
quences of the same 25 stimuli plus 10 x 2 replications of the 5 stimuli with- 
out a preceding [s] interspersed among them, resulting in 10 35-item blocks. 
The "no-[s]" stimuli were intended to remind the subjects of the stimulus por- 
tion to attend to, and perhaps to facilitate selective attention. 

Procedure . All subjects listened first to the tapes of Experiment 1a. 
Subsequently, in the same session, the integrative listening test was present- 
ed. As in Experiment 1a, the task <&3 to label the stimuli as beginning wich 
"si" or "spl." The pretest follower with instructions to label the stimuli 
as beginning with "bl" or "1." Finally, the analytic listening test was 
presented, in which the labels "bl" an^l "1" were again to be used. Subjects 
were told to ignore the [s], if present, to the best of their ability. They 
were informed about the structure of the stimuli and about the perceptual ef- 
fect to be avoided. 

Results and Discussion 

The [plaet]-[ laet] continuum was perceived as intended. In the pretest, 
the average percentages of "bl" responses to the 5 stimuli were 100, 100, 90, 
9, and 3, respectively. (Note the listeners 1 remarkable sensitivity to the 
3.5 ms release burst cutback occurring between stimuli 3 and 4; for comparable 
results, see Repp, 1984b: Exp. 1.) The same no-[s] stimuli interspersed in 
the analytic listening test received 99, 99, 92, 24, and 20 percent "bl" re- 
sponses, respectively. Thus, stimuli 4 and 5 were sometimes perceived as 
beginning with "bl" in this environment, but they still were clearly distin- 
guished from stimuli 1, 2, and 3, which sufficed for the purposes of this 
experiment. 

In both the integrative and analytic listening conditions, stimuli with 
no closure silence at all never elicited labial stop responses. Clearly, ana- 
lytic listening instructions were totally ineffective here — not an unexpected 
result. Therefore, those data were excluded from further analysis, reducing 
the number of closure durations to 4. Figure 2 shows the percentages of labi- 
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al stop responses in the two listening conditions as a function of closure 
duration and of stimulus number on the continuum. The responses to no-[s] 
stimuli in the analytic test are plotted on the far right. 




Figure 2. Percent stop responses in the integrative and analytic conditions 
of Experiment 2, separately for the five stimuli from the 
[plaet]-[laet] continuum. Data for the 0 ms closure duration are 
omitted. 



It is evident that the response patterns in the integrative and analytic 
conditions were highly similar. A repeated-measures ANOVA showed the expected 
significant main effects of closure duration and stimulus continuum, and also 
an interaction between these factors (all £*s < .0001), but no significant 
main effect of conditions. The conditions by closure duration interaction was 
significant, F(3,27) - 5.45, £ < .005, due to a slight reduction in labial 
stop percepts at the shorter closure durations in the analytic condition rela- 
tive to the integrative condition, and a relative increase at the longest clo- 
sure duration, where perceptual segregation or the [s] n^ise from the rest of 
the stimulus might have been expected to be relatively easier. This pattern 
of results is the opposite of the predicted one. Thus there is no evidence 
that the analytic listening instructions had the desired effect. Instead of 
selectively attending to the stimulus portion following the silence, the sub- 
jects apparently responded by parsing off the "s" and changing the "p" to "b" 
in their phonological (or orthograpnio) representation of the whole stimulus. 
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The peak rate of labial stop responses to stimuli 4 and 5 preceded by [s] 
(about 70 percent at 120 ms of silence in both conditions) clearly exceeded 
that for stimuli 4 and 5 in isolation, but was lower than that in Experiment 
1a (about 90 percent). This may suggest unstable "p" percepts, but the re- 
sults of the analytic condition do not bear this out. That is, the instabili- 
ty was only in the choice of response from one trial to the next, not in the 
percept on which it was based. 

It is interesting to note that stimuli 1, 2, and 3, which tended to give 
very similar results at longer closures and in isolation (probably due to a 
ceiling effect), elicited different response rates at the 40 ms closure dura- 
tion. In fact, an orderly trading relation can be seen between stimulus num- 
ber (i.e., degree of release burst truncation) and silent closure duration, as 
previously demonstrated by Repp (1984b, Exp. 1) for alveolar stops in the 
f, say tt - l1 stay lt contrast. The "sl^-^spl" boundary (50 percent intercept) ranged 
from approximately 30 ms (stimulus 1, extrapolated) to over 90 ms of silence 
(stimulus 5) — a remarkable range, considering that the release burst being 
truncated was only 11.5 ms long. A lot of silence was needed to compensate 
for the loss of a small piece of plosive noise. 



Experiment 2 suggests that, at least without special training, subjects 
are unable to dissociate an [s] noise perceptually from the following speech 
signal. In part, this may have been due to the relatively short silent 
intervals used. Experiment 3 examined the f\me issue at longer closure dura- 
tions, where selective attention to the stimulus portion following the [s] 
might be facilitated by the increased temporal separation and the consequent 
reduction of any potential auditory stimulus interactions across the silence. 
Experiment 3 used only an analytic listening condition, taking the integrative 
identification data of Experiment 1a for comparison. Since the closure 
intervals used were all in the range beyond the stop suppression effect, the 
expectation was that stop responses would be reduced relative to Experiment 1 a 
and would approximate the percentages for no-[s] stimuli. 



Subjects . Ten paid volunteers participated, four of whcm had taken part 
in Experiments 1 a and 2. 

Stimuli . The test sequence contained the five stimuli from the 
[plaet]-[laet] continuum preceded by the [s] noise at silent intervals of 100, 
150, 200, 250, 300, 400, and 500 ms. T w e resulting 35 stimuli were augmented 
by 4 repetitions of the 5 stimuli witnout preceding [s], and all 55 stimuli 
were recorded in 5 randomized orders with ISIs of 2.5 s. The pretest of 
Experiment 2 (no-[s] stinuli only) was also used. 

Procedure . Six of the subjects first listened to the pretest, labeling 
each stimulus as beginning with "bl" or "1." (The four remaining subjects had 
received the pretest in an earlier session in connection with Experiment 2,) 
Following the pretest, all subjects went through Experiment 4 (described be- 
low) before embarking on Experiment 3. The instructions were to ignore the 
initial [s], if present, and to label each stimulus as beginning with either 
"bl" or "1." The subjects were informed about the purpose of the experiment 
and about the nature of the stimuli. 



Experiment 3 



Method 



118 




Repp: Perceptual Coherence of Speech 



Results ant* Discussion 

The average pei .entages of labial stop responses to the five stimuli in 
the pretest were 100, 100, 89, 16, and 10, respectively. For the same stimuli 
in the analytic identification test, subjects' average percentages were 99, 
99, 78, 13, and 8. Unlike Experiment 2, there was no increase in "bl" re- 
sponses to stimuli H and 5 *n the environment of stimuli with initial [s], 
perhaps because there were no contextual stimuli that sounded like "slat." 

Figure 3 plots "ol" responses to sti. uli preceded by [s] as a function of 
jilent closure duration. The response percentages for the interspersed no-[s] 
stimuli are plotted on the far right. Several patterns are evident in the re- 
sults: (1) Stimuli 1, 2, and 3 elicited fewer stop responses when preceded by 
[s] than when presented in irolation. (2) At closure durations shorter than 
300 ms, stimuli 4 and 5 elicited more step responses when preceded by [s] than 
when presented in isolation. (3) The percentage of stop responses increased 
as closure duration decreased, reaching a peak at 150 ms for stimuli 3, 4, and 
5. Responses to stimuli 1 and 2, on the other hand, were not sensitive to 
changes in closure duration. In the analysis of .ariance, this was reflected 
in a significant closure duration by stimulus number interaction, F(24,216) * 
2.09, £ < .005. 




Figure 3. Percent stop responses in the analytic task that constituted 
Experiment 3. 
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The main result of this study is the increase in stop responses when 
[laet]-like stimuli were preceded by [s] at closure durations of less than 300 
ms* This increase resembles the results of Experiment 1a, obtained with stim- 
ulus 5 in a standard (integrative) labeling task. Thus, as in Experiment 2, 
subjects were not able to get rid of stop percepts by ignoring the [a] precur- 
sor and focusing their attention on the onset of the stimulus portion follow- 
ing the closure silence. Some measure of success in the selective-attention 
task is indicated, perhaps, by the fact that stop responses to stimulus 5 
preceded by [s] reached a maximum of only 5^ percent, whereas the same stimu- 
lus elicited as much as 90 percent stop responses in Experiment 1a. However, 
in the integrative condition of Experiment 2, there was also a relatively low 
percentage of stop responses to stimulus 5 at comparable closure durations 
(about 60 percent). Moreover, since subjects had been told that a preceding 
[s] tended to generate labial stop percepts that were to be avoided, a bias 
against responding "bl" may have operated. This is strongly suggested by the 
lowered rate of "bl" responses (around 80 percent) to stimuli 1 an] 2 preceded 
by [s], which certainly would have been labeled "spl" 100 percent of the time 
in an integrative task. Thus, Khe effect of the selective-attention instruc- 
tions on perceptual organization may actually have been rather small (see 
discussion of Figure 5 below). 

This conclusion must be qualified immediately, however, because closer 
inspection of the data revealed considerable individual differences (in con- 
trast to Experiment 2). In particular, there were 2 (out of 10) subjects who 
appeared to be totally successful in ignoring the [s] precursor, whose label- 
ing responses were not influenced by closure duration, and who exhibited no 
response bias. 6 Four or five other subjects showed patterns of which Figure 3 
is representative, and the remaining subjects exhibited idiosyncratic patterns 
and showed large response biases against "bl." These individual differences 
are reminiscent of those observed by Repp (1981) in a study that required 
listeners to dissociate a fricative noise perceptually fron a following vocal- 
ic portion. The success of two subjects in the present study suggests that 
analytic listening to speech components is not an impossible task, at least 
not when the closure durations are fairly long. These observations are con- 
sistent with the hypothesis that silence- induced stop percepts are products of 
a higher-level integrative process, and not of psychoacoustic interactions 
among stimulus components. Nevertheless, the fact remains that the perceptual 
strategy for performing the selective attention task was not available to most 
listeners, even though they had received a moderate amount of training by per- 
forming the low-uncertainty task of Experiment 4 before Experiment 3. 



Experiments 2 and 3 have provided only very limited evidence that sub- 
jects can perceptually dissociate the two stimulus components, even at rela- 
tively long temporal separations. In part, subjects 1 difficulties in carrying 
out the selective-attention instructions may reflect ingrained habits of inte- 
grative phonetic processing when listening to speech. At very short temporal 
separations, however, psychoacoustic interactions among the stimulus compo- 
nents may come into play, and these interactions may be truly impossible to 
disengage by acts of selective attention or c*:her perceptual strate^es. To 
investigate this issue further, Experiment 4 employed a low-uncertainty para- 
digm to test subjects 1 ability to distinguish between clear instances of 
[plaet] and [laet] when preceded by [s] at various fixed intervals of silence. 
It was expected that a reduction in stimulus uncertainty would facilitate the 
selective attention task. 



Experiment 4 
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Method 

Subjects , The same 10 subjects as in Experiment 3 Participated. 

Stimuli . Only stimuli 1 and 5 from the [pl«*t]-[ laet] continuum were 
used, as well as the [s] noise derived from the natural [splaet]. Seven stim- 
ulus sequences were recorded, each containing 20 repetitions of stimuli 1 and 
5 in random order, with ISIs of 2 s. In the first sequence, there was no 
preceding [s] noise. In the subsequent sequences, each stimulus was preceded 
by [s] at a fixed silent interval. Ovor these six sequences, the closure in- 
terval decreased from 500 to 200, 100, 50, 20, and finally 0 ms. 

Procedure . The subjects were told that, in each block of 40 stimuli, 
half were "blat" and half were "lat." They were asked to label each stimulus 
as beginning with "bl" or "1," guessing if necessary, and to ignore the [s] 
precursors. Wote that Experiment 4 preceded Experiment 3. 

Results and Discussion 

Figure 4 shows the effect of [s] precursors at various closure durations, 
with the no-[s] stimuli on the far right. Labeling of the two stimuli without 
the [s] precursor was virtually perfect. Reading the graph from right to 
left, it can be seen that discrimination of stimuli 1 and 5 (in terms of the 
difference in "bl" responses) was unaffected at the 500-ms interval, then de- 
creased but stayed fairly high up to the 50 ms separation; then it declined 
rapidly and reached chance at 0 ms (51.5 percent correct responses in terms of 
identification of stimulus 1 as "bl" and cf stimulus 5 as "1"). Although the 
subjects had been encouraged to guess even if all stimuli sounded like "lat," 
few followed these instructions. The low percentage of stop responses at the 
shortest closure durations reflects the fact that [s] ♦ [plaet] sounds like 
"slat" when there is no closure silence. 7 

Individual ^fferences were evident in this task also. Three subjects, 
including the two who stood out in Experiment 3, performed almost perfectly 
down to 20 ms of silence, where they suddenly gave only "1" responses and thus 
performed at chance level. The other subjects wt.e more error-prone at silent 
intervals of 50-200 ms, and one subject seemed to reverse the response cate- 
gories. 

To determine how subjects 1 performance in the low-uncertainty task of 
Experiment 4 compared with the performance obtained in Experiments 2 and 3, d f 
values for the stimulus 1 vs. stimulus 5 discrimination (treating the binary 
category labels as if they were "yes" and "no" responses in a signal detection 
ta3k) were computed from the overall response percentages — a rough measure 
that, however, is adequate for an informal graphic comparison. • These d f val- 
ues are plotted in Figure 5. The figure suggests that discrimination was more 
accurate in Exp. 4 than in Exp. 3, presumably due to the paradigm that reduced 
stimulus uncertainty and thus facilitated selective attention. It also seems, 
however, that at silent intervals in the range of 40-100 ms, there was no 
difference in accuracy between Exps. 2 and 4. (It is also clear that there 
was no difference between the integrative and analytic conditions in Exp. 2.) 
Since performance in Exps. 2 and 3 matched at intervals of 100-160 ms, there 
is no reason to assume that the subjects in Exp. 2 were especially accurate. 
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Figure 4. Percent stop responses in the low— uncertainty task that constituted 
Experiment 4 (stimuli 1 and 5 only). 




Figure 5. Discriminability of stimuli 1 and 5, expressed as d\ in Experi- 
ments 2-4 . 
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Rather, it seems that the procedure of Exp. 4, though it was beneficial at 
longer closure durations, conferred no advantage in the vicinity of the 
M sl»-"spl« category boundary (between 40-100 ms of silence; see Fig. 1). 

This observation, together with subjects 1 extremely poor performance at 
very short closure durations, is compatible with the hypothesis that the stop 
suppression effect, and with it the "sl"-"spl" category distinction, rests on 
a psychoacoustic interaction that cannot be disengaged through selective 
attention. The silence-cued "p" percepts (the stop generation effect) at 
intervals beyond 100 ms, on the other hand, are sensitive, to some extent, to 
listeners 1 strategies and thus may represent a higher-level integrative proc- 
ess peculiar to phonetic perception. The comparisons in Figure 5 suggest, 
furthermore, that discriminative sensitivity is heightened in the category 
boundary region, whereas discrimination at silent intervals characteristic of 
strong "p" percepts (i.e., within-category discrimination) is less accurate 
and requires the overcoming of integrative phonetic processing strategies. 
This pattern of results is similar to that obtained in many studies of 
categorical perception (see Pepp, 1984a). 

Experiment 5 

The hypothesis that the "sl^^spl" boundary—more specifically, the 
suppression of a stop percept at short closure durations — has a psychoacoustic 
origin, although consistent with the data so far, is contradicted by a recent 
study of Pastore, Szczesiul, and Rosenblum (1984). These researchers employed 
binaural phase shifts to differentially lateralize the [s] and [pllt] compo- 
nents of their "slit"- "split" stimuli. This manipulation left the category 
boundary (located at 68 ms of closure silence in their ^tudy) completely 
unaffected. The authors argued that differential lateralization should reduce 
psychoacoustic interactions between the stimulus components and that, there- 
fore, the absence of an effect suggests that the ^"-"spl" boundary does not 
rest on a psychoacoustic criterion. However, apart from the possibility that 
the phase shift technique was too weak a manipulation to remove psychoacoustic 
interactions, these results do not rule out such interactions at closure 
intervals shorter than the boundary value. 

An additional experiment probing the possible psychoacoustic basis of 
silence-cued stop consonant perception is also necessitated by the fact that 
Experiments 2-4 were relatively unsuccessful in disengaging subjects 1 integra- 
tive processing strategies. The evidence for a higher-level, speech-specific 
basis for the stop generation effect is suggestive at best, and a demonstra- 
tion that psychoacoustic interactions are not involved would strengthen the 
argument considerably. 

For the present stimuli , in which the difference between [plaet] and 
[laet] rests entirely on a brief release burst, the most obviouc psychoacous- 
tic hypothesis is that, at short temporal separations, the burst suffers from 
forward masking by the preceding fricative noise, and therefore becomes diffi- 
cult to detect. If so, then this masking effect should occur also when a 
burst of white noise is substituted for the [s] frication, provided that the 
energy of the white noise is not substantially below that of the frication. 
From the viewpoint of phonetic perception, however, the white noise is less 
speech-like and therefore should be more easily filtered out in a 
selective-attention task. If the ,l sl ll - ll spl ?l boundary does not rest on a 
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psychoacoustic interaction, subjects should be more successful in identifying 
"blat" and "lat" when white noise replaces the [s] precursor- 



Subjects. The same 9 subjects as in Experiment 1b participated. 

Stimuli . The five stimuli from the [plaet]-[laet] continuum were again 
used. Instead of a natural [s] noise, however, a burst of white noise was 
used as a precursor. The white noise was recorded from a General Radio 1390-A 
randan noise generator, low-pass filtered and digitized at half speed at a 20 
kHz sampling rate, it differed from the [s] noise used previously (Exp. 1a 
and Exps. 2-4) in three respects: (1) its duration was 200 ms, versus 125 ms 
for the [s] noise. (2) It was gated on and off abruptly, whereas the [s] 
noise had gradual on- and offsets. (3) It had a flat spectrum, whereas the 
spectrum of the [s] noise had a pronounced peak at about 8.6 kHz, which 
projected by about 20 dB above a relative energy plateau ranging from 4 to 10 
kHz. The spectral energy of the white noise matched that of the plateau; its 
energy was higher than that of the [s] noise below 4 kHz and above 10 kHz, and 
lower between about 8-9 kHz. Its energy at offset was considerably higher 
than that of the fricative noise across the whole spectrum. All these differ- 
ences led to the expectation that the white noise would have a more pronounced 
forward masking effect than the [s] noise, if such a psychoacoustic effect is 
involved at all. On the other hand, relatively long duration, abrupt offset, 
and flat spectrum are all uncharacteristic of natural fricative noises preced- 
ing a stop closure.* 

The stimulus tape matched that of the analytic condition in Experiment 2. 
That is, silent intervals ranged from 0 to 160 ms, and "no-noise" stimuli were 
interspersed. 

Procedure . All subjects listened first to the tape of Experiment 1b (an 
integrative labeling task) and then to the pretest, as used in Experiments 2 
and 3 (stimuli without preceding noise). Instructions for the main test were 
the same as in the analytic condition of Experiment 2: Ignore the noise and 
label the stimuli as beginning with "bl" or "1." 

Results and Discussion 

Figure 6 shows the results, which are strikingly different from those of 
Experiment 2 (cf. Fig. 2, right-hand panel). Over the range from 40-160 ms of 
silence, the white noise precursor had no effect at all on subjects' ability 
to identify the stimuli from the [plaet]-[ laet] continuum, except for 
introducing a slight bias against stop responses. 10 In particular, the white 
noise did not induce any stop percepts when it preceded stimuli 4 and 5. Only 
when there was no silent interval between the noise and the speech did the 
noise exert a perceptual effect, rendering stimuli 2-5 indiscriminable, while 
stimulus 1 continued to receive a higher rate of stop responses. Note also 
that, in this condition, subjects were equally willing to respond "bl" or "1," 
whereas in the corresponding condition of Experiment 2 (not shown in Fig. 2) 
responses were exclusively "1." This suggests that the subjects in Experiment 
5 considered the white noise as an extraneous signal that might obscure stop 
consonant cues present in the speech signal, whereas the subjects in Experi- 
ment 2 perceived the [s] noise as part of the utterance, even when asked not 
to do so, and thus were unwilling to consider the possibility of an inaudible 
stop consonant. 
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Figure 6* Percent stop responses in Experiment 5. 



It seems extremely unlikely that spectral or other properties of the 
white noise were responsible for its reduced masking power, since it was a 
more powerful signal than the [s] noise by most acoustic criteria. Although 
the [s] noise was more intense between 8 and S kHz, the spectral peaks of the 
labial release burst were in a region (below 4.5 kHz) where the white noise 
exceeded the [ s] no ise in energy . There fore , the resu Its suggest that 
psychoacoustic interference (i.e., forward masking) was involved only at the 
very shortest closure intervals (less than 40 ms). Consequently, the reduc- 
tion in stop responses when an [s] noise precedes [plaet] stimuli by 40-80 ms 
(see Fig. 2) probably does not represent psychoacoustic interference, but 
rather a specifically phonetic effect reflecting the listener's tacit knowl- 
edge about the minimal permissible duration of stop con'ionant closures in this 
context. Apparently, listeners are compelled to apply this knowledge as long 
as they perceive a coherent stream of speech. This conclusion is consistent 
with that reached by Pas tore et al. (1984), and it suggests that the two ef- 
fects of closure silence (stop suppression at short durations, stop generation 
at longer durations) can be accounted for within a single theoretical frame- 
work, that of perception in the "speech mode" (Liberman, 1982; Repp, 1982). 
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Summary and Conclusions 

The present series of studies addresaed the question of the origin of the 
auditory coherence of speech by focusing on one particularly striking phenome- 
non—that of silence-cued labial stop consonants in fricative-liquid context. 
This phenomenon illustrates both the coherence of acoustically heterogeneous 
speech components in general and the perceptual integration of disparate cues 
to the perception of a particular phonetic contrast. Between the fricative 
noise and the resonances resulting from production of the liquid consonant, 
there is an abrupt change in the nature and location of the sound source (from 
voiceless and dental to voiced and laryngeal) and in spectral composition 
(from higher to lower frequencies). Nevertheless, with or without an 
intervening brief silent interval, listeners usually perceive both sounds as 
part of a coherent speech stream. This coherence in turn gives rise to a stop 
consonant percept when a silent interval of appropriate duration (roughly, 
80-200 ms) is present. Thus the silence itself becomes part of the speech 
stream; rather than interrupting the continuity and contributing to the 
perceptual segregation of acoustically disparate signal components, the 
silence functions as a carrier of phonetic information. Only when the silence 
duration clearly exceeds the acceptable limits of a stop consonant closure 
does it lead to perceptual segregation of the signal components. 

It was hypothesized that the integrative function that gives rise to 
these phenomena is a characteristic of perception in the speech mode — that is, 
of perceiving the information that is most useful for linguistic communica- 
tion. One way of testing this hypothesis would be to lead listeners to per- 
ceive the same stimuli as either speech or nonspeech. Some evidence favoring 
the hypothesis has already been obtained using variants of that method (Best 
et al., 1981; Repp, 1981). A somewhat different approach was taken here. It 
was argued that, if perceptual intep- -ion of the form studied here is a 
speech-specific function, it might be possible to influence its operation by 
directly manipulating the listeners 1 interpretation of the speech stimulus, 
staying entirely within the speech mode. The success of this approach was not 
guaranteed, of course, since manipulation of listeners* strategies through 
instructions may simply be ineffective. In the absence of a convincing 
psychoacoustic explanation for the perceptual integration of speech compo- 
nents, however, negative findings may tell us that certain perceptual strate- 
gies are not easily modified or abandoned — not that they are not 
speech-specific. 

In a series of experiments (Exps. 2-5) following a basic demonstration of 
silence-cued stop consonants (Exp. 1), it was attempted to alter objects 9 
interpretation of the stimulus by instructing them to mentally separate the 
. ^icative noise from the following signal portion. The relative ineffective- 
m s of the selective-attention instructions with stimuli of seemingly minimal 
acoustic coherence is interpreted as evidence for the relative stability of 
the perceptual integration function. Experiment 3 indicated, however, that 
some subjects can be successful in this task, ani Experiment 4 showed that a 
low-uncertainty paradigm also facilitates selective attention. These results 
parallel those obtained in studies of categorical perception (see Repp, 1984a, 
for a review), where subjects frequently need to disengage or ignore another 
basic function of the speech mode, that of phonetic classification, in order 
to discriminate speech stimuli. In these studies, it seems that success in 
within-category discrimination often requires perceptual strategies that 
operate outside the speech mode. The present task, too, could in principle 
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have been accomplished by listening specifically for the release burst, though 
there was no evidence that the subjects used this "auditory" strategy. Rath- 
er, the few successful subjects appeared to be able to do what the instruc- 
tions asked for: to ignore the fricative noise and listen to the remainder of 
the stimulus as speech— a skill that trained phoneticians presumably would 
have in their repertoire. 

One way of ignoring a fricative noise is to think of it as a nonspeech 
hiss arising from a source outside the speaker's vocal tract. That this 
strategy could be effective is clear from Experiment 5 which, by substituting 
a nonspeech noise for the frication, actually created the situation that sub- 
jects otherwise might try to imagine. The ease with which the subjects car- 
ried out the selective-attention instructions in this situation argues against 
a psychoacoustic account of perceptual integration and of the effect of the 
silent interval on stop consonant perception. This latter effect has two as- 
pects, which were termed "stop suppres3ion" (short intervals) and "stop 
generation" (longer intervals). On the basis of the results of Experiment 5 
it was concluded that both of these effects are likely reflections of 
speech-specific perceptual criteria, with only the suppression effect at 
extremely snort closure silences having a psychoacoustic origin, 11 

In conclusion, then, the results of the present experiments are consist- 
ent with a theoretical view of speech perception that postulates a number of 
specific — though not necessarily unique — functions. These perceptual func- 
tions, which include the perceptual integration of speech components, are as- 
sumed to be driven by an internal representation of the regularities of spoken 
language. How this representation should be characterized and how it is ac- 
quired are fundamental questions for future research. 
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Footnotes 

l The question posed here is similar in many ways to that underlying 
categorical perception research (see Repp, 1984a), brt the methodology is dif- 
ferent. Categorical perception experiments examine subjects 1 ability to 
discriminate stimulus differences within phonetic categories; here, the focus 
is on listeners 1 ability to ignore one part of a stimulus (a skill that may 
play a role in some discrimination tasks). Both tasks are difficult because 
listeners tend to adhere to their habitual mode of phonetic perception, which 
is categorical and Integra ti/e. No claim is made here that this type of 
perceptual mode is specific to speech; it is called "phonetic" only because 
the stimuli happen to be speech. That being so, however, many specific in- 
stances of perceptual integration may indeed be speech-specific, simply De- 
cause they have no parallels in other domains of experience. 
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2 To be sure, the [s] noise must not be too long, and its offset and the 
[1] onset not too gradual; otherwise, no stop percepts will be obtained. The 
presence of stop manner cues in the [s] noise was irrelevant in Experiments 
2-4, because subjects 1 attention was directed toward the stimulus portion 
following the silence. As far as that portion is concerned, it was sufficient 
that it not elicit any stop percepts in isolation. No cleim is being made 
that either signal portion contained no cues whatsoever to stop consonant 
perception (see also Footnote 6). 

'Some subjects, especially in Experiment 1a, spontaneously gave "sP-l" 
responses, indicating that they detected stop manner cues in the frication, 
while at the same time perceiving a gap between the [s] and the rest of the 
stimulus. These responses were treated as equivalent to "s-l w ; thus they are 
not included in the "spl" percentages plotted in Figure 1. 

"The phonetic 3ymbol [p] represents a voiceless unaspirated labial stop 
consonant, which in English orthography is rendered as "p" in some contexts 
(e.g., following a voiceless fricative in the same syllable) but as "b" in 
others. Throughout this paper, phonetic symbols in brackets denote stimuli or 
the speaker's intentions, whereas orthographic symbols in quotes refer to re- 
sponses or the listeners' percepts. 

5 For the author and most subjects, jision of the natural labial release 
burst in [plaet] resulted in elimination of the stop percept. Some listeners, 
however, still claimed to hear a "b," which may reflect a special sensitivity 
to weak coarticulatory cues in the [1] portion. These coarticulatory cues may 
reside in spectral or amplitude properties of the signal immediately following 
the release burst or, perhaps more likely, in the shorter duration of the [1] 
as ccmoared to one articulated in absolute utterance- initial position. One 
additional subject in Experiment 2 and two additional subjects in Experiment 3 
were excluded because they perceived all stimuli from the [ plaet] -[ last] con- 
tinuum as r blat. " 

6 0ne of these two subjects had participated in Experiments la and 2. In 
the labeling task of Experiment 1a, which used stimulus 5 of the 
[ plaet] -[ laet] continuum, she gave 90 percent stop responses at closure dura- 
tions of 100 and 150 ms. In Experiment 2, for stimuli 4 and 5 with 120 and 
160 ms of silence, she gave 63 percent stop responses in the integrative 
condition, 70 percent in the analytic condition, and 0 percent when there was 
no preceding [s] noise. In Experiment 3, however, she gave not a single stop 
response to the same stimuli with silent intervals of 100 and 150 ms. Clear- 
ly, she had discovered an effective selective attention strategy in Experiment 
3, perhaps as a result of going through the task of Experiment k (where she 
likewise did not give any stop responses in the comparable stimulus condi- 
tions) . 

7 It might be noted that while the inexperienced subjects performed at 
chance level in the 0 ms condition, t h e author as a pilot subject obtained a 
score of 85 percent correct. Thus, it is not impossible to discriminate the 
[plaet] and [laet] components in this condition, but a different perceptual 
strategy seems to be required (viz., listening for a certain difference in au- 
ditory quality caused by the presence versus absence of a release burst). 
Note that this strategy is nonphonetic in character, unlike the phonetic 
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dissociation strategy requested by the analytic listening instructions. In- 
deed, those few subjects who seemed to be successful analytic listeners in 
Experiments 3 and 4 still failed to discriminate the stimuli at the very 
snortest closure durations. It is likely that nonphonetic strategies would be 
fostered by extensive training with feedback, which is one reason why this 
method was not used to induce analytic phonetic strategies. 

•The d' values were also computed for individual subjects and then aver- 
aged. The results were not substantially different from the global d' values 
shown in Figure 5. Although certain distortions in the global values may have 
occurred due to different degrees of criterion variability in different 
experiments, the individual subjects' values are even more distorted because 
of the many occurrences of response percentages of 0 and 100, which necessi- 
tate setting an arbitrary upper limit for d'. For this reason, the d' values 
computed from the average response percentages were preferred for tais inform- 
al comparison among experiments. 

*0f course, the white noise did not sound like a fricative noise (at 
best, it sounded remotely [f]-like). For this reason, an integrative listen- 
ing condition, in which subjects try to interpret the noise as a fricative, 
was not considered. The point here is that, if psychoacoustic interactions 
are involved, they should not depend on the speechlikeness of the noise. 

10 This tendency, as well as its apparent increase with closure duration, 
was due to two subjects' data only. 

1 Another possible auditory interaction that was not considered seriously 
here, but that may warrant some further investigation, is auditory short-term 
adaptation (see Delgutte & Kiang, 1984). The [s] precursor should adapt 
high-frequency neurons more than low- frequency neurons, so that the auditory 
response to the following signal portion would be more vigorous in the 
low- frequency regions, which might favor labial stop percepts. There are sev- 
eral problems with that hypothesis, however: (a) The long temporal range of 
the stop generation effect (Exp. 1) exceeds the range of auditory adaptation, 
(b) The stop suppression effect remains unexplained, (c) The ability of some 
subjects to disengage the stop generation effect argues against peripheral au- 
ditory factors, (d) T*ie [s] noise spectrum is not differentiated enough in 
the low-frequency region to substantially alter the shape of the "auditory 
spectrum 11 at the onset of the following signal, (e) The stop generation ef- 
fect is reduced by an increase in fricative noise duration (Exp. 1b; Repp, 
1984c). 
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DEVELOPMENT OF THE SPEECH PERCEPTUOMOTOR SYSTEM* 



Michael Studdert-Kennedyt 



Introduction 



The intent of the present paper is to reflect on the development of the 
speech perceptuomotor system in light of the infant's evident capacity for in- 
termodal (or, better, amodal) perception, discussed by Meltzoff and by Kuhl in 
this volume. The central issue is imitation. How does a child (or, for that 
matter, an adult) transform a pattern of light or sound into a pattern of mus- 
cular controls that serves to reproduce a structure functionally equivalent to 
the model? The hypothesis to be outlined is that imitation is a specialized 
mode of action, in which the structure of an amodal percept directly specifies 
the structure of the action to be performed (cf. Meltzoff & Moore, 1983). 



Let us begin by considering briefly the function of perception from an 
ethological perspective (Gibson, 1966, 1979; von UexkUll, 193*0. The general 
function of perception is to control action. Perception and action are two 
terms in a functional system that permits an animal to survive. To survive, 
an animal must constantly negotiate a physical world, moving around, over or 
under objects in its path, seeking food or mates, escaping from predators. 
The actions that an animal takes, its coordinated patterns of goal-seeking 
movements, are more or less precisely matched to the world it perceives; and 
the world it perceives is constantly modulated by the actions it takes. Thus, 
action and perception are mutually entailed components of a single system: 
each fits the other as key fits lock. 

How is the fit achieved? How are the varying patterns of light, sound, 
temperature, pressure that determine perception transduced into the neuromus- 
cular patterns that determine action? Can we find a single set of descriptive 
terms that will match all the various sensory modalities with the single 
modality of action? We may approach an answer to these questions by asking 
another: What information do light, sound and other modes of energy convey? 
Following Gibson (1966, 1979) we answer quite generally: Information that 
specifies the structures of objects and events to which action must adapt. 

We may note two properties of perceived object-event structures. First, 
they are amodal. We perceive a desk, say, through a pattern of light struc- 
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tured by its light-reflecting properties, or by touch through the pattern of 
mechanical resistance it offers to our fingers. A bat, being equipped with 
sonar, might perceive the desk by virtue of the desk's sound-reflecting prop- 
erties. Similarly, we normally perceive a spoken word through a pattern of 
sound, structured by the coordinated articulations of a speaker. To the ex- 
tent that these articulations reflect radiant energy within the visible spec- 
trum, we may also perceive the word by virtue of its optical structure. The 
deaf-blind, using the Tadoma method, may even perceive the word by touch (Nor- 
ton, Schultz, Reed, Braida, Durlach, Rabinowitz, & Chomsky, 1977). What we 
perceive, then, are objects and events, independent, in principle, of the sen- 
sory modalities through which we perceive them. 

The second point to note about object-event structures is that their per- 
ceived qualities vary with the perceiving organism. The "same" object has 
different utilities for different animals, or for the same animal at different 
times. Objects and events differ in what von UexkUll (1934) termed their 
"functional tones," what Gibson (1966) termed their "aff ordances." The puddle 
that a person steps over affords a dog an opportunity to drink; the desk that 
offers support for a writing pad on one occasion may serve as a seat on anoth- 
er; a word spoken in Mandarin is merely a vocalization to someone who knows no 
Chinese. Thus, different animals perceive different worlds (von UexkUll' s 
Umwelten ) , each structured by the animal's potential actions, just as its ac- 
tions are structurec by its perceived world. 



The Speech Percept as Amodal 

The first function of speech perception is social and communicative, a 
pragmatic function analogous to the general function of perception discussed 
above. As the carrier of language, speech offers meaning, that is to say 
(very broadly), information conveying the structure of a social world within 
which an individual may act. The individual, by acting in response, whether 
linguistically or non-linguistically , then modulates the perceived structure 
of her social world. 

A second function of speech perception, ontogenetically prior to the 
first and of more immediate interest here, is in language acquisition. While 
the adult may listen simply for meaning, the learning child must listen both 
for meaning and for information specifying a talker's articulatory gestures. 
This second perceptual function therefore controls action in the more limited 
sense of providing a model for imitation. 

Before we consider imitation, let us explicate and justify the claim that 
speech carries information specifying a talker's articulatory gestures. No- 
tice, first, that this is not the customary account. For example, Abercrombie 
(1967) characterizes one form of the information conveyed by speech as 
linguistic and segmental, intending by this a sequence of phonetic elements, 
the consonants and vowels of a phonetic transcription. This is certainly cor- 
rect, at one level of description, as our ability to read and write alphabeti- 
cally demonstrates. However, a transcription is so far removed from the sig- 
nal that most people in the world who can speak and understand speech cannot 
read or write. 
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What, then, i? the difference between the information in a spoken utter- 
ance and the information in its written counterpart? Following Carello, Tur- 
vey, Kugler, and Shaw (198*0 (see also Turvey & Kugler, 1984) we may say that 
the difference is between information that specifies and information that in - 
dicates . The information in a spoken utterance is not arbitrary: its acous- 
tic structure is a lawful consequence of the articulatory gestures that shaped 
it. In other words, its acoustic structure is specific to those gestures, sc 
tha' a human listener (who knows the language spoken) ha* no difficulty J'i 
following the specif icat' ons and organizing her own artici ations to reproduce 
the utterance. By contrast, the form of a written transcription is an arbi- 
trary convention, a string of symbols that indicate to the "eader what she is 
to do, but do not specify how she is to do it. The important point, h<f s is 
that ^ndicational information cannot control action in the absence of 1 *ma- 
tion specific uj tne act to be performed. For example, a road sign i" .ates 
that we are to stop, but we can only follow the instruction if we have iifor- 
mation specifying our velocity and our distance from the required stopping 
point (Turvey & Kugler, 1984). Similarly, we can only reproduce an utterance 
from its transcription, if we have information specifying the correspondences 
between the symbol string and the motor control structures that must be en- 
gaged for speaking- I« is these correspondences that the illiterate has not 
discovered. Just how these two forms of linguistic information are related 
is, of course, a central issue of speech research. My concern here is merely 
to makt, the distinction. For we shall be led astray in our study of speech 
perception (and so of speech acquis' tion) , if we ? ^ive to equate the lin- 
guist's description of speech as a string of symbols <ith the dynamic struc- 
ture of the speech signal it3elf. 

Consider, here, an early interpretation of the lip-reading studies of 
McGurk and MacDonald (19 7 These authors discovered that listeners* percep- 
tions of a syllable pn uted over a loudspeaker could be changed, if they 
simultaneously watched a videotape of a speaker producing another syllable. 
For example, presented with audio [ba] and video [da], subjects typically re- 
port the latter, optically specified syllable; presented with audio [na] and 
video [ba], subjects typically report [ma], a combination of the two. Such 
observations are consistent with the notion that subjects engage two indepen- 
dent phonetic systems, drawing manner and voicing features from the acoustic 
structure, place of articulation features from the optic structure (MacDonald 
& McGurk, 1978). This interpretation assumes that we perceive speech by 
extracting phonetic features and combining them to form phonetic segments — in 
other words, it assumes that the speech signal carries information about a 
string of linguistic symbols As already remarked, this is true at one level 
of description. However, this interpretation bypasses the actual event speci- 
fied by the dynamic acoustic-optic structure and does not address the puzzle 
of its transformation into a static linguistic symbol. 

Moreover, the featurcl interpretation breaks down in the face of other 
findings. For example, presented with audio [ga] and video [ba], subjects 
typically report a cluster [b'ga] or Cg'baj; presented with the reverse 
arrangement, audio [ba] and video Cga], subjects ofte* report a sort of acous- 
tic-optic blen., [da]. In these instances, the percept corresponds either to 
both inputs or to neither, so that the notion of two independent and additive 
phonetic systems breaks down. 
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While much remains to be done before we have a satisfactory account of 
such findings, the effect seems to arise from a process by which two continu- 
ous sources of information, acoustic and optic, are actively combined at a 
precategorical level where each has already lost its distinctive sensory 
quality (Summerf ield, 1979). In other words, the McGurk effect (and, indeed, 
normal lipreading as practiced in aural rehabilitation) i3 only possible be- 
cause acoustic and optic structures specify an amodal event: a coordinated 
pattern of articulatory action. 



A general capacity to imitate is rare among animals. The specialized 
capacity to imitate vocalizations is confined to a few species of birds and of 
marine mammals, and to man. Here we should distinguish between mimicry and 
repetition, or reproduction. The Indian mynah bird, for example, mimics human 
speech quite precisely, within the limits of its vocal apparatus (Klatt & 
Stefanski, 1974). However, a human speaker repeats the utterances of another 
(when not deliberately attempting mimicry) by producing a functionally equiva- 
lent, though acoustically distinct, pattern of sound. Given wi thin-species 
individual differences in size and structure, we may reasonably suppose that 
I .e production of distinct, yet functionally equivalent, acts is the normal 
mode of animal imitation, whether in human speech or in, say, the nest-build- 
ing of a young chimpanzee. In any event, both mimicry £>nd reproduction call 
on a specialized capacity for finding in the perceptual array an organized 
pattern of information specific to an organized pattern of action. To find a 
pattern the imitator must find both the pieces of an act and their spatio-tem- 
poral relations (Fentress, 1984). 

Consider, for example, the following transcription of ten attempts by a 
1i>-:nonth old girl to say £en, within a single half-hour's recording session: 
[maa, v A,de dl ?,hin, m bG, p h m, t h nt h nt h n,ba h ,d h au N ,bu3] (Ferguson & Farwell, 
1975). Note once again that the' transcriptions are merely convenient (and 
approximate) indicators of what the child did. For what the child evidently 
did, in each case, was to extract from the sound pattern of pen information 
specific to certain articulatory gestured, such as lip closure, lingua-alveo- 
lar closure, velum lowering, glottal narrowing and spreading. Thus the oMld 
analyzed the word (with varying success) into its component gestures, or 
pieces, but could not disco/er, at least motorically, their spatio-temporal 
relations. Perhaps we have here an instance of th° necessary sequence in 
learning to speak, or indeed in learning to reproduce tn, acc performed by an- 
other: first perceptual analysis, then motor synthesis. We can hardly doubt 
that a capacity to perceive the pieces of an act and their relations, and to 
reproduce tY m in our own behavior, rests on some form of structural (anatomi- 
cal, physiological) correspondence between imitator and mode 1 . This observa- 
tion leads us to a br*ef digression. 

Can Non-huma n Animals Perceive Speech ? 

The answer to this question must depend on what we mean by "perceive 
speech." Here we hwe been misletf, it would seem, by the behaviorist view of 
perception as a me^ matter of psychophysical capacity. We have tended to de- 
scribe speech in purely acoustic terms as a collection of "cues," without re- 
gard to the articulatory events that the cues specify, and then to suppose 
that any animal able to discriminate these cues can perceive speech. Yet the 
psychophysical capacities of an unlimited set of animals- -from the human in- 
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fant to the chinchilla--may suffice to discriminate among formant transitions, 
formant onset frequencies, brief silences, patches of noise, and so on. How- 
ever, these capacities may not suffice to discover the functional relations 
among the perceptual pieces. 

In fact, the perceptual status of communicative signals varies even for 
closely related species. For example, while two species of macaque (pig-tail 
and bonnet) and an African vervet may learn an arbitrary discriminative re- 
sponse to contrasting calls of the Japanese macaque, the latter learns the re- 
sponse significantly more rapidly (Zoloth, Petersen, Beecher, Green, Marler, 
Moody, & Stebbins, 1979). Moreover, the processes underlying the Japanese 
macaque's response to its own calls are evidently localized in the left cere- 
bral hemisphere, while those of the other two species of macaque are not 
(Peterson, Beecher, Zoloth, Moody, & Stebbins, 1978; cf. Heffner & Heffner, 
1984). Whether this hemispheric specialization has a perceptuomotor origin 
(as in the human: see belcw\ we do not yet know. The point here is that, if 
we show a particular discriminative task to be within the psychophysical 
competences of two different species, we have not thereby shown their percepts 
to be equivalent. 

In short, if the structure of perception can properly be said to be tuned 
to the structure of the perceiver's capacity for action, a non-human animal's 
per -Dtion of speech must differ radically from a human's. What actions of a 
mac^ue, say, are controlled by its perception of speech? What events do the 
acoustic patterns of speech specify for a macaque? Presumably, the patterns 
do not specify articulatory gestures, and the actions brought under control in 
the laboratory (such as lever holding or escape from shock) are the arbitrary 
choices of an experimenter, adventitious and ethologicaily empty. In other 
words, the information in speech may indicate to a non-human animal what it 
should do in a particular situation, but ( pace the mynah bird) the information 
cannot specify for the animal, as it does for a human, the speaker's pattern 
of articulatory gestures. 

Perceptuomotor Relations in the Infant 

Since the infant, by definition, does not speak, our understanding of 
perceptuomotor development over the first year of life must be largely 
inferential. Here I will consider three classes of evidence, concerning: (1) 
the adult perceptuomotor system, particularly its cerebral locus; (2) infant 
perceptual capacity; (3) infant behavior, reflecting hemispheric specializa- 
tion for speech perception, 

The Adult Perceptuomotor System 

Aphasia studies for over a century have suggested that the right cerebral 
hemisphere of most right-handed individuals is essentially mute (see, for 
example, Milner, 197*0. Differential anesthesia of left and right hemispheres 
by intracarotid sodium amytal injection (preparatory to possible brain sur- 
gery) has confirmed this fact experimentally (Borchgrevink, 1982; Milner, 
Branch, & Rasmussen. 1964). Thus, speech motor control is vested in the left 
hemisphere of most individuals (roughly 90% of the population). (The origins 
of a population diversity, such that speech motor control is vested in the 
left hemisphere for some 90%, in the right hemisphere for some 10* of the 
population, are not yet understood.) 
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Since any imitative behavior calls for close neurophysiological connec- 
tions between perceptual and mo' ^r processes, we might predict that left hemi- 
sphere control of articulation would be coupled with left hemisphere speciali- 
zation for speech perception. Numerous monotic and dichotic studies of normal 
subjects have confirmed this prediction, and have demonstrated a double 
dissociation of left and right hemispheres for the perception of speech and 
non-speech (e.g., Kimura, 1 961 a, 1 961 b ; Studdert-Kennedy & Shankweiler, 1 970 N . 
Furthermore, studies of split-brain patients (whose cerebral hemispheres have 
been surgically separated for relief of epilepsy) have shown that, while the 
right hemisphere may recognize the meaning of a word from its overall auditory 
shape, only the left hemisphere can carry out the phonetic analysis necessary 
to establish a new word in an individual's lexicon (Zaidel, 1974, 1978). 
(Phonetic analysis refers, of course, to analysis of a word into its articula- 
tor components and to recognition of the relations among them, as discussed 
above.) Thus, we have solid evidence that the adult speech perceptuomotor sys- 
tem is a left hemisphere function. 

Infant Perceptual Capacity 

As is well known, infants in the first six months of life can 
discriminate virtually any adult speech contrast on which they are tested (for 
reviews, see Aslin, Pisoni, & Jusczyk, 1983; Eimas, 1982). Much of the infant 
research has been carried out with synthetic speech continua on which adults 
typically display "categorical perception," that is, good discrimination be- 
tween sounds that fall into different adult phonetic categories, but poor 
discrimination between sounds that fall into the same phonetic category. 
Infants have generally displayed a similar pattern, and this outcome has been 
interpreted as evidence that infants are prepared at birth, or very soon 
after, to perceive speech in terms of adult phonetic categories (Eimas, 1982). 

This interpretation has been weakened by tvo sets of findings. First, we 
now know that categorical perception is not peculiar to speech, nor even to 
audition (e.g., Pastore et al. f 19 7 7). Second, Kuhl and her colleagues (Kuhl, 
1978; Kuhl & Miller, U78; Kuhl & Padden, 1983) have demonstrated categorical 
discrimination along synthetic speech contirua for macaques and chinchillas. 
The issue is complicated by the fact that speakers of different languages may 
display different boundaries between the phonetic categories of a continuum 
(see Repp, 1984) and we may suspect (following the argument of the previous 
section) that quite different processes underlie the seemingly equivalent hu- 
man and animal behavior. However, let us assume that categorical perception 
is essentially a psychophysical phenomenon, susceptible perhaps to effects of 
learning and attention, but based on the psychoacoust * ? tuning of the 
mammalian auditory system. 

Nonetheless, we have ample other evidence that speech already has a 
unique status for the infant within a few hours or days of birth. For exam- 
ple, neonates can discriminate speech from non-speech (Alegria & Noirot, 1978, 
1982), prefer speech to non-speech (Hutt, Hutt, Lenard, Bernuth, & 
Muntjewerff, 1968), and prefer their mother's voice to a stranger's (DeCasper 
& Fifer, 1980), provided she speaks with normal intonation rather than in 
word-by-word citation 'Mehler, Barrifere, & Jasik-Gerschenfeld, 1978;. Howev- 
er, the strongest evidence for the unique status of speech comes from studies 
of infant hemispheric specialization. 
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cerebral Asymmetry for Speech In Infants 

A number of studies have demonstrated dissociation of the left and right 
sides of the brain for perceiving speech and non-speech sounds at , or very 
shortly after, birth. These include both physiological and behavioral stud- 
ies. For example, Molfese, Freeman and Pc. ermo (1975) measured auditory 
evoked responses, over left and right tempora. lobes, of 10 infants aged from 
one week to 10 months. Their stimuli were four naturally spoken monosyll- 
ables, a C-Major piano chord, and a 250-4000 Hz burst of noise. Median ampli- 
tude of response was higher over the left hemisphere for all four syllables in 
nine cut of ten infants, higher over the right hemisphere for the chord and 
the noise in all ten infants. Molfese (1977) has reported similar asymmetries 
for syllables and pure tones in neonates. 

Dissociation between responses to speech and non-speech has also been 
demonstrated by Best, Hoffman, and Glanville (1982). These authors tested 
forty-eight 2- 3- and 4-month-old infants for ear differences in a memo- 
ry-based dichotic task. They used a cardiac orienting response to measure re- 
covery from habituation to synthetic stop-vowel syllables and to Minimoog 
simulations of concert A (440 Hz), played on different instruments. In the 
speech task, a single dichotic habituation pair (either /ba-da/ or /pa-ta/) 
was presented nine times at randomly varying intervals. On the tenth presen- 
tation, one ear again received its habituation syllable, while the other re- 
ceived a test syllable (either /ga/ or /ka/), differing in place of articula- 
tion from both habituation syllables. An analogous procedure was followed in 
the musical note task. The results showed significantly greater recovery of 
cardiac response for right ear test syllables in the 3- and 4-month-olds , and 
for left ear musical notes in all age groups. The authors propose that 
right-nemisphere memory for musical sounds develops before left-hemisphere 
memory for speech sounds, and that the latter begins to develop between the 
second and third months of life. 

A further, particularly telling result, in light of the presumed amodal 
nature of the speech percept, comes from a study by MacKain, Studdert-Kennedy , 
Spieker, and Stern (1983). These authors showed that 5- to 6-month-old 
infants preferred to look at the face of a woman repeating the disyllable they 
were hearing (e.g., [zuzi]) than at the synchronized face of the same woman 
repeating another disyllable (e.g., [vava]). Thus, as in the study of Kuhl 
and Meltzoff (1982; Kuhl, this volume), infant preferences were for natural 
structural correspondences between acoustic and optic information, specifying 
the same articulatory event. 

However, the most remarkable aspect of the study by MacKain et al. (1983) 
was that infant preferences for a match between the facial movements they were 
watching and the speech sounds they were hearing were only significant when 
the infants were looking to their right sides. We can interpret this result 
in the light of work by Kinsbourne and his colleagues (e.g., Kinsbourne, 1972; 
Lemport & Kinsbourne, 1982). Their work suggests that attention to one side 
of the body may facilitate processes for which the contralateral hemisphere is 
specialized. If this is so, we may infer that infants with a preference for 
matches on their right side were revealing a left hemisphere sensitivity to 
articulation specified by acouslic and optic information. 
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The work by MacKain and her colleagues has not yet been replicated. But 
if it proves reliable, we have some evidence that 5- to 6-month-old Infants, 
close to the onset of babbling, already display a left hemisphere sensitivity 
to the amodal structure of speech events. For the moment, this seems to be 
close as we have come to detecting an </>o ^pient capacity for imitation on 
which spoken language i? based. 



Perception and action are mutually entailed components of a single sys- 
tem. Their interlocking operation is possible because the information picked 
up by a perceptual system is amodal and directly specifies, within the con- 
straints of the actor's goal, the action to be performed. 

Imitation is a specialized mode of action, requiring the imitator to find 
in the act of a model both the pieces of the act and their spatio-temporal re- 
lations. Imitation also calls for close neurophysiological connections be- 
tween perception and motor control. For speech these perceptuomotor connec- 
tions are localized in the left cerebral hemisphere. 

Studies of infant speech perception have shown that infants are sensitive 
to structural correspondences between acoustic and optic specifications of 
speech, and that their left cerebral hemispheres are differentially activated 
by speech sounds soon after birth. We also have preliminary evidence for left 
hemisphere sensitivity to the amodal structure of speech by the fifth or sixth 
month of life. 

The approach to speech perceptuomotor development outlined above also 
promises an ontogenetic solution to the vexed problem of the incommensurabili- 
ty of the speech acoustic signal and its linguistic description. The approach 
distinguishes between the dynamic information conveyed by an act and the stat- 
ic information in a symbol string. Thus, linguistic units are not postulated 
as part of the infant's native endowment. Rather they are seen as elements 
that emerge from a self-organizing system of perceptuomotor control (cf. Lind- 
blom, MacNeilage. & Studdert-Kennedy, 1983). 



Abercrombie, D. (1967). Elements of general phonetics . Chicago: Aldine. 

Alegria, J., & Noirot, E. (1978). Neonate orientation behavior toward human 
voice. International Journal of Behavioral Development , ±, 291-312. 

Alegria, J., & Noirot, E. (1982). Oriented mouthing activity in neonates: 
Early development of differences related to feeding experiences. In 
J. Mehler, S. Franck, E. C. Walker, & M. Garrett (Eds.), Perspectives on 
mental representation : Experimental and theoretical studies of cognitive 
processes and capacities . Hillsdale, NJ: Erlbaum. 

Aslin, R. N., Pisoni, D. B., & Jusczyk, P. W. (1983). Auditory development 
and speech perception in infancy. In M. M. Haith & J. J. Campos (Eds.), 
Infancy and the biology of development. Vol. II of Carmlchael y s manual 
of child psychology , 4th edition. New York: Wiley and Sons. 

Best, C. T., Hoffman, H., & Glanville, B. B. (1982). Development of infant 
ear asymmetries for speech and music. Perception & Psychophyslcs , 31 » 



Borchgrevink, H. M. (1982). Mechanisms of speech and musical sound percep- 
tion. In R. Carlson & B. GranstrOm (Eds.), The representation of speech 



Summary and Conclusions 



References 



75-85. 



138 




Studdert-Kennedy: Development of the Speech Perce ptuomotor System 



in the peripheral auditory system (pp. 251-258), New York: Elsevier 
Biomedical Press. 

Carello, C, Turvey, M. T. , Kugler, P. N. f & Shaw, R. E. ( 1 98^4 ) • 
Inadequacies of the computer metaphor. In M. S. Gazzaniga (Ed.), Hand - 
book of cognitive neurosclence . New York: Plenum. 
DeCasper, A. J. 9 & Fifer, W. P. (1980). Of human bonding: Newborns prefer 

their mother's voices. Science , 208, 11 74-1 1 76 • 
Eimas, P. D. (1982). Speech perception: A view of the initial state and 
perceptual mechanics. In J. Mehler, E. C. T. Walker, & M. Garrett 
(Eds.), Perspectives on mental representation (pp. 339-360). Hillsdale, 
NJ: Erlbaum. 

Fentresu, J. C. (1984). The development of coordination. Journal of Motor 

Behavior , 1j> § 99-134. 
Ferguson, C. A., & Farwell, C. B. (1975). Words and sounds in early language 
acquisition: English initial consonants in the first fifty words. Lan - 
guage , 51 . 419-439. 

Gibson, J. J. (1966). The senses considered as perceptual systems . Boston, 

MA: Houghton-Mifflin. 
Gibson, J. J. (1979). The ecological approach to visual perception . Boston: 
Houghton-Mifflin. 

Heffner, H. E., & Heffner, R. S. (1984). Temporal lobe lesions and percep- 
tion of species-specific vocalizations by macaques. Science , 226, 75-76. 
Hutt, S. J., Hutt, C, Lenard, H. G., Bernuth, H., & Muntjewerff, W. J. 
(1968). Auditory responsivity in the iuman neonate. Nature , 218 , 
888-890. 

Kimura, D. (1961 a). Some effects of temporal lobe damage on auditory percep- 
tion. Canadian Journal of Psychology , 15 , 156-165. 
Kimura, D. (1 961 b) . Cerebral dominance and the perception of verbal stimuli. 

Canadian Journal of Psychology , 15 , 1 66-1 71 . 
Kinsbourne, M. (1972). Eye and head turning indicates cerebral lateraliza- 
tion. Science , 176 , 539-541. 
KlaU, D. H. , & Stefanski, R. A. (1974). How does a mynah bird imitate human 

speech? Journal of the Acoustical Society of America , 55 , 82-89. 
iwhl, P. K. (1978). Predispositions for the perception of speech-sound cate- 
gories: A species-specific phenomenon? In F. D. Minifie & L. L. Lloyd 
(Eds.), Communicative and cognitive abilities . Early behavioral assess - 
ment . (NICHD Series) (pp. 229-155). Baltimore, MD: University Park 
Press . 

Kuhl, P. K. , & Meltzoff, A. N. (1982). The bimodal perception of speech in 

infancy. Science , 218 , 1138-1144. 
Kuhl, P. K. , & Miller, J. D. (1978). Speech perception by the chinchilla: 
Identification functions for synthetic VOT stimuli. Journal of the 
Acoustical Society of America , 63, 905-917. 
Kuhl, P. K. t & Padden, D. M. (1983). Enhanced discriminability at the 
phonetic boundaries for the place feature in macaques. Journ al of the 
Acoustical Society of America , 73, 1003-1010. 
Lempert, H., & Kinsbourne, M. (1982). Effect of laterality of orientation on 

verbal memory. Neuropsychology , 20 , 211-214. 
Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983). Self-organizing 
processes and the explanation of phonological universals. In B. Butter- 
worth, B. Comrie, & 0. Dahl (Eds.), Explanations of linguistic 
universals . The Hague : Mouton . 
MacDonald, J., & McGurk, H. (1978). Visusl influences on speech perception 

processes. Perception & Psychophyslcs , 24 , 253-257. 
MacKain, K. S., Studdert-Kennedy, M., Spieker, S., & Stern, D. (1983). In- 
fant intermodal speech perception is a left hemisphere function. Sci - 
ence , 219, 1347-1349. 

k S* „ . 1» 



ERJC 143 



Studdert-Kennedy : Development of the Speech Perceptuomotor System 



McGurk, H., & MacDonald, J. (1976). Hearing lips and seein 6 voices. Nature , 
264 , 746-748. 

Mehler, J., Barriere, M. , & Jasik-Gerschenfeld, D. (1976). La recornaissance 
de la voix maternelle par le nourrisson. La Recherche , 70 , 787-788. 

Meltzoff, A. & Moore, M. K. (1983). Newborn infants imitate adult facial 
gestures. Child Development , 574, 702-709. 

Milner, B. (1974). Hemispheric specialization: Scope and limitations. In 
F. 0. Schmitt & F. G. Worden (Eds.), The neurosclences : Third study pro - 
gram . Cambridge, MA: MIT Press. 

Milner, B., Branch, D., & Rasmussen, T. (1964). Observations on cerebral 
dominance. In V. S. DeRenck & M. O'Connor (Eds.), Disorders of language 
(Ciba Foundation Symposium, pp. 200-21 4). London: J. & A. Churchill. 

Molfese, D. L. (1977). Infant cerebral asymmetry. In S. J. Segalowitz 4 
F. A. Gruber (Eds.), Language development and neurological theory . New 
York : Academic Press. 

Molfese, D. L., Freeman, R. B., & Palermo, D. S. (1975). The ontogeny of 
brain lateralization for speech and nonspeech stimuli. Brain and Lan - 
guage , 2, 356-368. 

Norton, S. J., Schultz, M. C. . Reed, C. M., Braida, L. D., Durlach, N. I., 
Rabinowitz, W. M. , & Chomsky, C. (1977). Analytic study of the Tadoma 
method: Background and preliminary results. Journal of Speech and Hear - 
ing Research , 20, 574-595. 

Pastore, R. E., Ahroon, W. A., Baffuto, K. J., Friedman, C. , Puleo, J. S., & 
Fink, E. A. (1977). Common factor model of categorical perception. 
Journal of Experimental Psychology : Human Perception and Performance , 3$ 
686-696. 

Petersen, M. R., Beecher, M. D., Zoloth, S. R., Moody, D. B., & Stebbins, 
W. C. (1978). Neural lateralization of species-specific vocalizations 
by Japanese macaques ( Macaca fuscata). Science , 202 , 324. 

Repp, B. H. (1984). Categorical perception: Issues, methods, findings. In 
N. J. Lass (Ed.), Speech and language : Advances In basic research and 
practice (Vol. 10). New York: Academic Press. 

Studdert-Kennedy, M. , & Shankweiler, D. P. (1970). Hemispheric specializa- 
tion for speech perception. Journal of the Acoustical Society of Ameri- 
ca, 48, 579-594. 

Sunmerfield, Q. (1979). Use of visual information for phonetic perception. 
Phonetica, 36, 314-331. 

Turvey, M. T., & Kugler, P. N. (1984). A comment on equating information 
with syntool strings. American Journal of Physiology : Regulatory, Inte- 
grative and Comparative Physiology , 246 , R925-R927. 

von UexkUll, J. (1934). A stroll through the worlds of animals and men. In 

C. H. Schiller (Ed.), Instinctive behavior (reprinted 1957). New York: 
International Universities Press. 

Zaidel, E. (1976). Language, dichotic listening and the disconnected 
hemispheres. In D. 0. Walter, L. Rogers, & J. M. Finzi-Fried (Eds.), 
Conference on Human Brain Function , Brain Information Service, BRI. Los 
Angeles: UCLA Publications Office. 

Zaidel, E. (1978). Lexical organization in the right hemisphere. In 
P. A. Buser & A. Rougeul-Buser (Eds.), Cerebral correlates of conscious 
experience (pp. 177-197). Amsterdam: Elsevier /North Holland Biomedical 
Press. 

Zoloth, S. R., Petersen, M. R., Beecher, M. D., Green, S., Marler, P., Moody, 

D. B., & Stebbins, W. (1979). Species-specific perceptual processing of 
vocal sounds by monkeys. Science , 204 , 870-873. 



140 



9 

ERLC 



144 



DEPENDENCE OF READING ON ORTHOGRAPHY: INVESTIGATIONS IN SERBO-CROATIAN* 
Claudia Carellot and rt. T. Turveytt 



1 Introduction 



The relation between script and speech differs among the various ortho- 
graphic categories. In general, alphabets maintain a closer link than do 
logographies. Comparisons between instances of each category, say between En- 
glish and Chinese, are instigated in order to uncover whether or not different 
orthographic styles might be reflected in differing processing strategies used 
by readers. A number of investigators have pointed out, however, that "alpha- 
bet" does not constitute a monolithic category and English is, in no sense, to 
be taken as typical of all alphabets. Nonetheless, a majority of the reading 
data have been collected for English and the conclusions they suggest have 
been accepted, more or less by default, for alphabets in general. But a grow- 
ing body of data for Serbo-Croatian, the (alphabetically transcribed) language 
of Yugoslavia, reveals important differences with English. We will summarize 
these data and elaborate their implications for linguistic issues, particular- 
ly the role of phonology in reading, that may be important for Chinese. 



Orthographies can be distinguished along a number of dimensions, two of 
which will concern us here. First, they differ with respect to the particular 
units that are overtly represe ed, be they morphemes or syllables or the more 
(linguistically) abstract phonemes. Second, orthographies can be considered 
deep or shallow depending on their relative remoteness from the sounds to be 
read. As will be illustrated in the following characterizations of Ser- 
bo-Croatian, English, and Chinese, these dimensions are orthogo- 
nal — orthographies of "equal depth" can differ in the unit represented. 

Serbo-Croatian uses an alphabet that represents phonemes in a 
straightforward symbol- to-sound mapping: Each letter has only one pronuncia- 
tion. A novel word or pseudoword can be named (in the sense of pronounced) 
simply by generating the sounds from the letters. A letter such as a will be 
pronounced /a/ regardless of the letters that precede or follow it (ignoring, 
of course, subtle changes as a consequence of coarticulation) . In order to 
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preserve this mapping, the etymological relationships among words are sacri- 
ficed. Wherever the spoken language has impartea phonological variation in, 
say, declensions of a given noun, the variations are enforced in the spelling 
(e.g., nominative singular RUK+A , dative singular RUCI; nominative singular 
SNAHA, dative singular 3NASI). It is, therefore, considered to be a shallow 
orthography (Liberman, Liberman, Mattingly, & Shankweiler, 1980). 

In contrast, English uses an alphabet that also represents phonemes but 
enforces morphological continuity. Where the spoken language changes the 
pronunciation of a root morpheme, its spelling does not necessarily change. 
The sounds are determined by phonological rules with the result that etymolog- 
ical hints are retained (e.g., the relationship between "bomb" and "bombard" 
is preserved in their spellings despite alteration in the sound of the second 
"b"). A novel word or pseudoword can be named by generating the sounds from 
the letters and phonological rules. An alphabet that does not represent 
phonological variations that are determined by phonological rule 1 car. be said 
to be deep. 

Finally, Chinese uses a logography to represent morphemes. Although a 
large proportion of characters are phonograms — comprising both a semantic and 
a phonetic component — the hints to sound are not completely reliable (Wang, 
1973). Using the phonetic component to sound out a character yields only 39% 
accuracy (Tzeng & Hung, 1980). By and large, therefore, the character names 
must be memorized in order to be read. Because of the opacity of the phonolo- 
gy, Chinese can be considered a deep orthography. 

The fact that orthographies differ with lespect to both the units they 
represent and the phonological transparency of those representations suggests 
that orthographies might also vary in the linguistic demands that they place 
on the reader, particularly the beginner. In other words, the effective use 
of orthographies might depend on how much readers know about the structure of 
their languages, with certain orthographies requiring an explicit understand- 
ing of the more abstract (and, presumably, harder to come by) aspects. Limit- 
ing our discussion to structural units, speaker -hearers can become aware of 
the words, morphemes, syllables, arid phonemes that comprise their spoken lan- 
guage. If they are to become readers of that language, alphabets require an 
appreciation of the phonemic structure that logographies do not. Whatever the 
orthography, the level of linguistic awareness (Mattingly, 1972) must be 
compatible with the units represented, while using the orthography might be 
said to tune one to the level of awareness demanded. By this reasoning, flu- 
ent readers of Chinese are less likely to be aware of the phonemic structure 
of their language than are fluent readers of English because fluency in the 
morpheme-based orthography does not demand such awareness. 

A similar circular causality is found in what has been termed phonologi- 
cal maturity (Liberman et al., 1980), the appreciation that readers have, to 
varying degrees, of the (morpho-)phonologio^l rules which rationalize spel- 
lings that are related complexly to sound. That is to say, phonological 
maturity helps in reading words where phonological variation is determined by 
rule rather than orthographic representation (e.g., real is read /rel/, 
reality is read /re.aV .at.e/) ; reading experience, in turn, promotes 
phonological development. The demands of linguistic awareness and phonologi- 
cal maturity can be said to parallel, more or less, the dimensions we identi- 
fied as distinguishing orthographies— the represented unit and its phonologi- 
cal transparency, respectively. 2 
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3 Serbo-Croatian : A Bi-alpha betic, Inflected La nguage 



Phonological transparency is only one characteristic that distinguishes 
Serbo-Croatian from English. The major language of Yugoslavia is also highly 
inflected. Nouns, pronouns, and adjectives are declined in seven plural and 
seven singular cases (nominative, locative, dative, instrumental, genitive, 
accusative, and vocative). Verbs are conjugated by person and number in six 
forms. But, because of the dictum to "Write as you speak and read as it is 
written" (the guiding principle behind the mid-1 9th century alphabet reforms 
directed by the Serbian language scholar Vuk Karadzid), root morphemes often 
are varied orthographically when an inflectional element is added. 

Of primary relevance to transforming the linguistic issues of the last 
section into experimental questions, however, is the fact that Serbo-Croatian 
is written in two alphabets. Both the Cyrillic script (learned first in east- 
ern parts of the country) and the Roman script (learned first in the West) map 
onto the same set of 30 phonemes but in an ir teres ting way. While most let- 
ters are unique to one or the other alphabet, seven are common (i.e., are read 
the same way in the two scripts) and four are ambiguous (i.e., receive a dif- 
ferent phonetic interpretation in each script). Since Yugoslavs are typically 
facile with both alphabets, the letters can be combined in a variety of ways 
for experimental purposes, which will become apparent in Section 5.0. 



We are interested in whether or not variations in the speech-script rela- 
tionship promote differing processing strategies in reading. Since reading 
involves recognizing words, one process that has received considerable scruti- 
ny is the pattern recognition step— how is a written letter string matched to 
its lexical representation? This question of lexical access has been ad- 
dressed with (primarily) two paradigms: (1) In lexical decision tasks, sub- 
jects must decide as rapidly as possible whether or not a given letter string 
is a word; (2) In naming tasks, subjects must simply read the letter string 
aloud as rapidly as possible. In both tasks, the time transpiring between on- 
set of the stimulus and initiation of the response is measured. Visual and 
phonological characteristics of the letter strings are varied to ascertain 
what effecc, if any, they have on the response latencies. 

Effects on lexical decision time are taken to have implications for the 
nature of lexical access, models of which include linguistic processes (phono- 
logical recoding of letter strings), nonlinguistic processes (simple figural 
analyses), and combinations of both (dual processing). Effects on naming may 
be consistent with one or another lexical routes or may suggest, further, that 
the lexicon need not be accessed at all in order to pronounce a letter string. 
These implications rest on two logical underpinnings. First, if a letter 
string is phonologically ambiguous (i.e., can be pronounced in more than one 
way), then any phonological analysis (if it exists) ought to be hindered in 
comparison to such an analysis on phonologically unique letter strings. This 
would be true in both lexical decision and naming. If phonological ambiguity 
produces no effect, the case for phonological analysis is undermined. Second, 
while the three general models of word processing all suggest that words 
should be named faster than pseudowords, a phonologically analytic strategy 
ought to yield a fairly small difference that is relatively constant for 
ambiguous and unambiguous letter strings. An interaction between lexicality 
and phonological ambiguity, however, would seem to support one of the other 
models. These will be elaborated in Section 6.0. 



4 Assessing Lexical Access 
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Obviously, a great „aal hinges on the manipulation of phonological 
ambiguity. In English, two methods have been used. In one, pseudowords are 
constructed to be homcphonic with words. While lexical rejection of 
pseudohomophones takes longer than rejection of pseudowords (Coltheart, Dave- 
laar, Jonasson, & Besner, 1977), at least for good readers (Barron, 1978), 
interpretation of this fact is tricky because the appropriateness of 
pseudohomophones has been questioned on a number of grounds (Feldman, Lukate- 
la, & Turvey, 1985; Martin, 1982). These include (i) the possibility that 
phonetic representations may be sensitive to orthographic differences between 
letter strings that sound alike when spoken aloud; (ii) the formal distinc- 
tion, in English, between phonetic and morphophonological representations; and 
(iii) the suspicion that pseudohomophones are structurally odd. 

The second way in which phonological ambiguity has been manipulated in 
English is through a comparison of words with regular and irregular (or excep- 
tional) pronunciations. Whether or not differences are found, however, 
depends on how regularity is defined (Parkin, 1982). For example, words in 
which each graphemic unit receives the major phonemic correspondence (as de- 
tailed in Venezky's [1970] rules) are considered regular while those that 
receive a minor correspondence may be treated as irregular (Coltheart, Besner, 
Jonasson, & Davelaar, 1979). A finer distinction reveals that words can be 
classified as regular and consistent (i.e., they and all words that are visu- 
ally similar to them receive the major phonemic correspondences) or regular 
and inconsistent (i.e., they receive major correspondences but other exemplars 
receive minor correspondences and, thus, are irregujar [Glushko, 1979]). Some 
irregular words might be considered especially exceptional, however, if only 
because lexicographers provide pronunciation guides for them (but not for all 
minor correspondence words [Parkin, 1982]). Moreover, a particular 
grapheme-phoneme correspondence will be considered minor and, therefore, 
exceptional because there are fewer instances of it when, in fact, those in- 
stances might occur with greater frequency than the so-called major 
grapheme-phoneme correspondences (Parkin, 1982). Lastly, phonologically 
irregular words may differ with respect to whether or not they are orthograph- 
ically irregular as well (Parkin & Underwood, 1983). Depending on which of 
these characterizations of regularity is used, one will or will not find 
differences between regular and irregular words, either supporting or belying 
claims for phonological analysis. 

As important as the phonological manipulation is to evaluating lexical 
properties, it is not clear that studies in English have been successful in 
providing unequivocal tests. The task is much more straightforward in Ser- 
bo-Croatian, however, where the unique properties of the orthography can be 
exploited. In the following review, we will focus on the bi-alphabetism of 
fluent readers. 

5 Reading in Serbo-Croatian Is Ph onologically Analytic 

Because Serbo-Croatian is phonologically shallow, there are no minor 
phonemic correspondences, no irregular words nor inconsistent regular words, 
and no orthographically irregular words. Phonological ambiguity is manipulat- 
ed by choosing words (or nonwords) that combine common letters with unique 
letters (unambiguous letter strings) or common letters with ambiguous letters 
(ambiguous letter strings). The lexical status of letter strings so chosen 
will depend on their phonemic interpretation— that is, in which alphabet they 
are read. For example, an ambiguous string could be a word in Cyrillic but a 
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pseudoword in Roman (or vice versa). Or it could be one word in Cyrillic but 
a different word in Roman (or pseudowords in both). An unambiguous string 
could be a word in one alphabet and impossible in the other (or a pseudoword 
in one and impossible in the other). Finally, i.* composed exclusively of com- 
mon letters, a string would be the same word in both alphabets (or the same 
pseudoword). 

In lexical decision tasks, comparisons of response times to the variety 
of letter string types reveals a phonological ambiguity effect— an ambiguous 
letter string takes longer to decide about than an unambiguous letter string. 
This is true when it is (i) a word in one reading and a pseudoword in the oth- 
er; (ii) a word, though different, in both readings; and (iii) a pseudoword, 
though different, in both readings (Lukatela, Popadid, Ognjenovid, & Turvey, 
1980; Lukatela, Savid, Gligori jev id, Ognjenovid, & Turvey, 1978). The effect 
is more pronounced with words than pseudowords (Feldroan & Turvey, 1983; 
Lukatela et al., 1978). The greater the number of ambiguous letters in the 
string, the longer lexical decision takes (Feldman, Kostid, Lukatela, & Tur- 
vey, 1983; Feldman & Turvey, 1983). While attempts to bias subjects toward a 
Roman reading by instructions or task (i.e., uniquely Cyrillic letters never 
appear) did not eliminate the effect, the presence of a single unique charac- 
ter did (Feldman et al., 1983; Lukatela et al., 1978). Finally, the effect is 
more pronounced in good readers than in poor readers (Feldman et al., 1985), 
suggesting that those who more effectively exploit the phonologically analytic 
strategy are harmed more by ambiguity. 

It is important to note that the phonological ambiguity effect is not an 
artifact of the frequency of ambiguous letter strings. These occur regularly 
in the Serbo-Croatian language. But the point is underscored nicely by two 
experimental findings. First, in a comparison of two inflected forms of the 
same noun, frequency is (at one level) equal since they are the same word 
(e.g., RUKA and RUCI both mean hand). But the occurrence of the various 
grammatical cases differs such that nominative singulars (e.g., RUKA) are at 
least ten times more frequent than dative singulars (e.g, RUCI). When both 
forms are unique letter strings, the latency for nominatives is (about 80 ms) 
shorter. When the nominative singular is ambiguous and the dative singular is 
unambiguous (i.e., has one unique character), latency for datives is (about 
185 ms) shorter (Feldman et al., 1983). Phonological ambiguity overrides the 
frequency advantage. 

The second rejoinder to frequency arguments comes from a comparison of 
words that are ambiguous in one alphabetic transcription but unique in the 
other. For example, the Cyrillic version of "hawk" — KObAm — is unique 
(pronounceable only as /kobats/) while its Roman version— KOBAC--is ambiguous 
(pronounced /kobats/ if read as Roman but /kovas/ if read as Cyrillic. With 
such pairs, a word can be used as its own control: Frequency, meaning, 
length, number of syllables are identical. Only the number of morphophonolog- 
ical representations is different but that is sufficient to produce a 350 ms 
difference in decision time (Feldman, 1981). 

6 Word- pseudoword Comparisons 

As indicated in Section 4.0, the three general models of word processing 
agree that words should be named faster than pseudowords. Their reasons are 
quite different, however, as are the particulars of how lexicality might in- 
teract with phonological ambiguity. A model of visual analysis suggests that 
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words and pseudowords are read aloud by a common analogical process. Very 
roughly, a word finds a perfect analogy in the lexicon, with a singularly de- 
fined code for pronunciation; a p3eudoword finds several analogies in the 
lexicon, defining several alternative pronunciations. The competition among 
lexical entries induced in the case of pseudowords would account for their 
slower naming relative to words (e.g., Glushko, 1979; Kay & Marcel, 1981). 
The effects of such competition ought to be especially (perhaps exclusively) 
apparent in experiments that compare phonologically ambiguous letter strings. 

A model of phonological analysis holds that words and pseudowords are 
read aloud by a common phonological strategy that uses spelling-to-sound rules 
(based on the same principle as, though not necessarily identical to, the 
grapheme-to-phoneme correspondences identified by Venezky [1970]). Very 
roughly, the more regular the letter string the more rapid the recoding. As a 
rule, pseudowords will be less phonologically regular than words, resulting in 
slower naming latencies (e.g., Parkin, 1982; Parkin & Underwood, 1983). This 
residual difference should not change when both types of letter strings are 
chosen to be purposely ambiguous. 

Finally, a dual process view asserts that words are read aloud by a visu- 
ally based look-up of a word's lexical representation where the word's 
pronunciation can be retrieved. In contrast, pseudowords are read aloud by 
assembling a pronunciation on the basis of grapheme-phoneme correspondences. 
It is hypothesized that visual access is faster than rule-based assembly; 
consequently, words are named more rapidly than psaudowords (e.g., Coltheart, 
1978; Coltheart et al., 1979). Phonological ambiguity should affect only 
pseudowords since their names alone are derived phonologically. 

In Serbo-Croatian, at least, it appears that the difference in naming 
latencies between words and pseudowords does not change when phonological 
ambiguity is manipulated (Feldman, 1981). Both are slowed by about 450 ms 
when the letter strings can be read in two ways, suggesting that phonological 
involvement is the same for words and pseudowords. Certainly, this strategy 
is encouraged by the fairly direct correspondence to speech that the Ser- 
bo-Croatian orthographies exhibit. One might expect a different pattern with 
English, where the correspondence between orthography and speech is abstract. 
While English and Serbo-Croatian have not been compared directly (i.e., in the 
same experiment with the same controls) on the lexicality-ambiguity interac- 
tion, the direct comparisons that have been performed reveal differences be- 
tween the languages that are germane to this issue. Since these involve a 
manipulation — semantic priming — that we have not yet discussed, we'll take a 
mcment to describe its logic before summarizing the results. 

It is commonly found that lexical decision and naming are facilitated 
when the target word is preceded by a semantically related priming word (Beck- 
er & Killion, 1977; Massaro, Jones, Lipscomb, & Scholz, 1978; Meyer, 
Schvaneveldt , & Ruddy, 1975). The general assumption is that when the prime 
activates its own lexical representation, that activation spreads to semanti- 
cally related items, thereby speeding their subsequent lexical processing. 
Tasks that are lexically mediated ought to be facilitated; tasks that are not 
facilitated are unlikely to be lexically mediated. 

Semantic priming of lexical decision is, in fact, found in both English 
and Serbo-Croatian (Katz & Feldman, 1983). For naming, however, facilitation 
is found only for English, suggesting that naming in the phonologically shal- 
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low Jerbo-Croatian orthography need not involve the lexicon. This point is 
underscored oy the correlations bct*:cd lexical decision and naming (which may 
be taken as an index of processing sLnilarxty ) . In English, pe:*ormanrs on 
ssmantically primed lexical decision correlates with naming, whether the lat- 
ter is semantically primed or not; lexical decision without semantic pricing 
also correlates with naming, whether primed or not . In Serbo-Croatian , the 
only significant correlation occurred when neither task was semantlcally 
primed. "The similarity between tasks is strongest when there is least in- 
vol/emant of the internal lexicon 11 (Katz & Feldmar, 1983, p. 1 6 3 ) . 

7 Conclusion 

The case for phonological analysis as the primary, nc optional reading 
strategy in Serbo-Croatim is quite strong. It is not yew clear, however, 
wheih^i 1 or not this strategy is peculiar to Serbo-Croatian (or writing systems 
wilii similar properties): Does phonological analysis result from experience 
with a shallow orthography (i.e., doe', orthography influence processing) or is 
it simply easier to demonstrate in the sorts of experiments that the Ser- 
bo-Croatian orthography allows? 

As strongly as we argue for a phonologically analytic strategy in Ser- 
bo-Croatian, other3 have cl 'med that Chinese characters can only be read via 
the visual route. Indeed, lexical decision i3 slowed by a visual manipulation 
wherein th" internal components of two-character words (and nonwords) are 
distorted isp* oportionately , for example, f*J becomes ptj ; jjj becomes Jp y (Hung, 
Tzeng, Salzman, & Dreher, 1984). This parallels the result for mixing upper 
and lower case letters in English (e.g., Coltheart & Freeman, 1974) but is in 
contrast to mixing Cyrillic and Roman letters in Serbo-Croatian. The latter 
slows neither lexical decision nor naming (Feldman & Kostid, 1981; Katz & 
Feldman, 1 98 1 ) . Interestingly, however, visual distortion in both Chinese and 
English affects poor readers more than good readers (Hung et al., 1984). This 
is puzr,lin£ if one assumes that the manipulation interferes with the putative- 
ly optimal strategy on which better readers ought to be more reliant. Ser- 
bo-Croatian, at least, folJows the expected logic for a phonologically analyt- 
ic strategy— good readers are hurt more by phonological ambiguity (Feldman et 
ai., 198b). 

We do not know if fluent readers of Cninese rely cn some 3tr^tegy ot.ier 
than visual analysis or if they can resort to some other strategy Jf the visu- 
al route is hindered. We do know that there are hints of some phonological 
analysis of Chinese characters. Detection of graphemic components (e.g., 
3 /tai/) is more successful when the component carries a phonetic clue (a* in 
?ft /tai/) than when it does not (as in f* /yi/ [Hung & Tzeng, 1 981 J ) . Incon- 
sistent cnaracters take lonf^r to name than consistent characters (where con- 
sistency is defined by the ratio of exemplars pronounced the same as the tar- 
get to the total number of characters with that phonetic, regardless of how 
they are pronounced [Fang & Horng, this volume]). And a comparison of 
Japanese kanji (tne logographic script borrowed from Chinese) with kana (a 
syllabary that depicts the phonetic olue of its characters) reveals that 
colors are named faster when written in kana even though color names appear 
more frequently in kanji in Japanese literature (Feldman & Turvey , 1980; 
cf. Saito, 1981). This last finding, especially, seems> troublesome for those 
models that restrict the role cf phonological analysis. Phonological involve- 
ment is demonstrated fo^ word 49 ( not just pseudowords) and it appears to 
facilita e, rather than slow, laming. One might argue that if phonological 
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analysis is optional, then it is an option readily (eagerly?) exploited wh?n 
available — even in writing systems that are biased, by design and practice, in 
favor of visual analysis (cf. Brooks, 1977). 
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Footnotes 

^his is aki* to Klima's (1972) third convention. 

2 We find these parallels to be pedagogically useful but they may be 
idiosyncratic and should not be taken as representative of how linguistic de- 
mand is characterized typically. For example, Mattingly (1984) hac recently 
revised his distinction of phonological maturity and linguistic awareness as 
entailing grammatical knowledge ar ' access to such knowledge, respectively. 
Wt are less able to use this distinction for our present purpose of classify- 
ing orthographies. 




THE RELATIONSHIP BETWEEN KNOWLEDGE OF DERIVATIONAL MORPHOLOGY AND SPELLING 
ABILITY IN FOURTH, SIXTH, AND EIGHTH GRADERS 



Joanne F. Carl islet 

Abstract , This study investigated young students' knowledge of 
derivational morphology and the relationship between this knowledge 
and their ability to spell derived words. The suDjects (fourth, 
sixth, and eighth graders) were given the Wide Range Achievement 
Test, Spelling subtest, and several experimental tasks — 1) a test of 
their ability to generate base and derived forms orally; 2) a 
dictated spelling test of the same base ana derived words; and 3) a 
test of their ability to apply suffix addition rules. The results 
indicate strong developmental trends in both the mastery of deriva- 
tional morphology and the spelling of derived forms; however, spel- 
ling performances lagged significantly behind the ability to 
generate the same word*3. Success generating and spelling derived 
words depended on the complexity of the transformations between base 
and derived forms. Further, mastery of phonological and orthograph- 
ic transformations most strongly distinguished the three grades in 
both spelling and generating derived forms. Other indications that 
the older students were using knowledge of morphemic structure in 
spelling derived forms were found in analysis of the spelling of 
base and derived word pairs and the application of suffix addition 
rules. However, incomplete mastery of the phonological and ortho- 
graphic transformations suggests that students might benefit from 
explicit instruction in morphemic structure in order to improve 
their spelling of derived words. 

Introduction 

It is commonly acknowledged that learning to spell English words requires 
an understanding of the relationships between phonemes and graphemes and a 
memory for those words or parts of words that are "irregular." However, since 
our orthography is morphophonemic, it seems reasonable to believe that a 
knowledge of the morphemic structure of words woula be helpful, perhaps even 
necessary, to spell accurately the many words of more than one morpheme that 
we use in writing. Although we know that understanding morphology develops 
gradually from childhood to adulthood, little is known abtnat the extent to 
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which this knowledge helps an individual acquire oroficiency in spelling. The 
present study is concerned with the spelling of derived forms and addresses 
the question, is there a relationship between knowledge of derivational 
morphology and spelling ability? 

Although the relationship between morphological knowledge and the 
acquisition of spelling skill would seem to have educational relevance, there 
have been very few investigations of the matter. The paucity of research 
studies is surprising since quite a few theorists have suggested that 
sensitivity to morphemic structure should enhance the ability to spell English 
words (Frith, 1980; Henderson, 1982; Liberman, Liberman, Mattingly, & Shank- 
weiler, 1980; Mattingly, 1980; Venezky, 1970) and that explicit instruction in 
the morphemic structure of words could have benefits for the student learning 
to spell derived words (Chomsky, 1970; Russell, 1972). 

The learning of derivational morphology is a complex matter. Although 
not a necessary part of the grammar of the language, the affixes allow us to 
express a concept (e.g., love ) in a number of different grammatical forms, 
usually while ret ting the basic identic of the base form (e.g., lovable , 
lovely , loveliness). While having familiar morphemes in many different words 
offers ease and efficiency in conveying meaning, this benefit accrues only if 
we are able to appreciate the morphological relationship between different 
words in the same word family. Unfortunately, the distance between base and 
derived form in phonology and semantics can sometimes be a formidaole barrier. 
As Klima (1972) suggests, it is questionable whether most adult speakers of 
English recognize the many relatively obscure morphological relationships that 
exist in the English language. How many, for example, are aware that crux and 
crucial are members of the same word family? 

Both the range and the complexity of the phonological transformations 
from base to derived forms may make derivational relationships hard to 
appreciate. While Chomsky and Halle (1968) have proposed that the phonologi- 
cal changes from base to derived forms are orderly and ruleful, a number of 
researchers have questioned the psychological reality of the underlying phono- 
logical rule system (Barganz, 1971: Jaeger, 1984; Moskowitz, 1973; Steinberg, 
1973; Templeton, 1980). Cc llectively , these suggest that children and adults 
have varied degrees of understanding of the underlying phonological rule sys- 
tem. 

Several characteristics of derivational morphology make productive knowl- 
edge problematic. First, the construction of derived forms does not follow 
consistent patterns. For example, two quite similar words such as terror and 
horror have only some of the same derived forms (Richardson, 1977). They have 
in common terrible and horrible , terrify and horrify ; on the other hand, there 
is terrorize , but not horrorize and horrid but not terrid . Second, the range 
of syntactic options makes learning the proper derived forms complex. Derived 
nouns, for example, can end in -ity , -ment , -ness , -ence , and -th , Just to 
name a few variations. In some cases, a base word occasionally has several 
derived forms of one same part of speech, such as honestness and honesty or 
bountiful and bom eous . Third, differences in the meanirgs of the suffixes 
are often subtle or nonexistent. In fact, the same suffix can have different 
meaning3, depending on the word it is attached to (Thorndike, 1941). For 
example, the suffix -ful has different meanings in the words cupful and help - 
ful. Finally, derived forms sometimes undergo semantic shifts that mike their 
relationship to their base forms seem remote. This is the case with apply and 
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appliance . When similarity in waning is absent, the realization that the 
base and derived forms are related requires more linguistic sophistication 
than many individuals have. 

In view of such complexities, it is possible that the learning of 
word-specific patterns may play an important role in the learning of deriva- 
tional morphology. Awareness of the morphological relatedness of words and 
the ability to analyze morphemic structure may depend on combined features of 
phonological and semantic similarities and associations, on linguistic 
sophistication and even on the specific characteristics of the language tasks 
used to assess this ability (Derwing & Baker, 1979; Smi;h * Sterling, 1982). 

The Development of Knowledge of Derivational Morphology 

Children learn inflected forms of words rulefully. Their knowledge of 
most inflectional rules, evident from the ability to supply the correct forms 
of nonsense words in sentences, is generally complete by the time they are se- 
ven years old (Berko, 1958). Derivational rules, however, are learned more 
slowly and less systematically than inflectional rules. Children's vocabulary 
growth during the years 7 to 12 includes many words of complex morphological 
structure, particularly derived forms (Ingram, 1976). To some extent, 
morphophonemic rules appear to be learned during this time (Moskcwitz, 1973). 
However, the productive knowledge of even basic derivational forms may not be 
complete even for teenagers (Selby, 1972). In fact, since derivational 
morphology is an open system, learning derived forms can take place throughout 
adulthood for individuals who have some curiosity about words (Klima, 1972). 

Although derivational morphology cannot be said to be mastered withir a 
particular developmental period, certain developmental trends in ruleful 
learning of derived forms have been found. Using a task modeled after 
Berko's, Derwing (1976) found a consistent trend among children (ages 8 to 
12), adolescents, and adults toward productive knowledge of five of the six 
derivational patterns he selected for investigation. These were the agentive 
-er , the adjective, noun compound, instrumental -er , and the -ly adverb. 
(The sixth pattern was the diminutive, which did not become productive.) The 
developmental trend toward mastery found in this study suggests that the 
learning of derived forms begins soon after age seven when the inflected forms 
have usually been mastered, a phenomenon also evident from Moskowitz's study 
(1973). 

The constructions that Derwing found to be productive are regular and 
quite transparent. The base word remains intact in the derived form and does 
so withort requiring a change in the phonology of the base word. Net all 
derivational relationships are so regular in construction or so closely relat- 
ed in phonology and orthography (Berko, *958). There is less evidence to sug- 
gest that children have productive knowledge of th03e forms with complex 
phonological and semantic relationships. 

Morphological Knowledp e and Spelling Ability 

English orthography maps onto the morphophonology of the language. Chom- 
sky and Halle (1968) note thai, *here changes in pronunciation from a base to a 
derived word are predicted by the regular sound pattern of the language, the 
orthography does not need to reflect the change (e.g., race to racial and re - 
duce to reduction). A number of studies have shown that the orthographic 
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regularities seem to provide the reader with clearer clues to morphological 
relationships than the underlying phonological rules (Barganz, 1971; Jaeger, 
1984; Jarvella & Snodgrass, 1974; LaSorte, 1980; Moskowitz, 1973; Steinberg, 
1973; Templeton, 1980). The reader who can discover from the regularity of 
the spelling that two words are morphologically related can use this knowledge 
to good advantage through efficient processing of words and through apprecia- 
tion of semantic relationships and syntactic variations. It is not surpris- 
ing, therefore, that there appears to be quite a strong relationship between 
morphological knowledge and reading or vocabulary development (Barganz, 1971; 
Freyd & Baron, 1982; LaSorte, 1980). 

The issue we are addressing here, however, js not whether orthographic 
regularities help the reader, but whether they are useful to the spell- 
er — whether knowledge of the morphemic structure of words, which may be more 
apparent from the orthography than the phonology, is drawn upon by the speller 
of derived words. Reading and spelling, though closely related, are quite 
different tasks (Frith, 1980). C. Chomsky (1970) argues that the use of 
orthographic knowledge to spell derived words correctly is a natural develop- 
ment, at least for the good speller who can recall the orthographic 
similarities of related words, even when the pronunciations are dissimilar. 
She suggests that the spellers' knowledge of word families can help disambigu- 
ate such troublesome elements as the spelling of an unstressed vowel, as in 
democracy (where knowing democrat helps) or a silsnt consonant, as in mi sole 
(where knowing muscular helps). Russell (1972) believes that the phonological 
and orthographic regularities, apparent from reading words, can be emphasized 
in instruction in spelling. However, neither Chomsky nor Pussell offers di- 
rect evidence to support the position that knowledge of morphological struc- 
ture helps the speller spell derived words correctly. 

While studies of the spelling of young children give some indication of a 
growing awareness of morphemic structure (Marino, 1979; Rubin, i984; Schwartz 
& Doehring, 1977), we do not know if an awareness of simple morphemic struc- 
ture carries over to the spelling of derived forms, particularly those that 
undergo phonological or orthographic shifts. How well an individual speller 
can apply morphological knowledge to the task of spelling may depend on how 
explicit as well as how extensive this knowledge is. It may also depend on 
the speller's mastery of the orthographic conventions that govern the addition 
of suffixes to \ase words. 

Two studies have looked at the spellers' ability to use morphological 
knowledge. One ' * an investigation of the use of phonological knowledge and 
orthographic knowledge in a dictated spelling task (real and nonsense words) 
involving gocd spellers at the sixth-, eighth-, and tenth-grade levels (Tem- 
pleton, 1980). The results of this study suggest that seeing a base word 
prompted better recall of the phonological rules governing the spelling of de- 
rived forms than hearing the base word. In addition, the students could spell 
the nonsense derived words better Lhan they could pronounce them. Templeton 
3uggested that learning about the orthographic structure of derived words 
might bring about a more comprehensive and productive awareness of the under- 
lying phonological rules. 

The second study of spellers' sensitivity to morphemic structure was of 
good and poor spellers at the college level (Fischer, Shankweiler, & Liberman, 
1 985). Good spellers were much better than poor spellers at spelling 
morphophonemioally complex words. This discrepancy was particularly striking 
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because the two groups differed less in their ability to spell words that are 
orthographically transparent ( adverb ) or orthographically deviant ( Fanren - 
heit). Performances on additional tasks suggested that differences in spel- 
ling morphophonemically complex words were attributable to differences in 
linguistic knowledge, specifically knowledge of morphological structure. The 
good spellers were superior to the poor spellers on nonsense tasks of prefixa- 
tion and suffixation, suggesting that they were not simply better spellers of 
real words. 

While these two studies seem to indicate that good spellers between the 
sixth grade and college level can use morphological knowledge to help them 
spell derived words, this pattern may not hold for poor spellers at these lev- 
els in school or for younger students. Spelling errors made by junior high 
school students have been observed to indicate lack of awareness of morphemic 
structure (e.g., easally for easily ) (Carlisle, 1984). Similarly, in an anal- 
ysis of spelling errors on compositions, Sterling (1983) fourd that 
12-year-old students treated derived words as if they were monomorphemic 
words. His analysis of the students' spelling errors indicated that inflected 
forms were spelled by ruleful system, but derived forms were spelled as unana- 
lyzed wholes. He suggested that access to the knowledge of morphological re- 
lationships may be obscured by the complex nature of derivational morphology. 

Experiment 

rn« general purpose of the present study was to investigate the early 
stages of acquisition of knowledge of derivational morphology and of the abil- 
ity to spell derived words. Several different considerations guided the 
formulation of the questions and the design of the study. First, on the basis 
of investigations by Berko (1958), Derwing and Baker (1979), and Selby (1972), 
it was expected that learning derivational morphology would begin in the third 
or fourth grades, following the mastery of the inflected forms. Accordingly, 
students in the fourth, sixth, and eighth grades were chosen as subjects in 
order to provide insight into the developmental mastery of derivational 
morphology. Second, the study was based on the hypothesis that students do 
acquire ruleful knowledge of the derivational morphology and that they do not 
simply learn to spell derived forms as unanalyzed whole words. 

The research questions were as follows: First, are there developmental 
trends between the fourth and eighth grades in the acquisition of morphologi- 
cal knowledge and knowledge of the spelling of derivatives? Second, is there 
a relationship between the knowledge of derivational morphology and the abili- 
ty to spell derived forms in the fourth, sixth, and eighth grades? Third, is 
there evidence that the learning of derivational .'orphology and the spelling 
of derived forms is ruleful in nature, taking into account boch phonological 
and orthographic transformations? 

In order to investigate these issues, two tasks were devised to allow for 
direct comparison* of the two skills—an oral test of the ability to generate 
derived forms and a dictated spelling test using the same words. The words 
were chosen to include four possible relationships between base forms and de- 
rived forms, on the assumption that these would engender errors that would re- 
flect different levels of mastery of phonological and orthographic rules. 
Included were (a) word pairs in which there is NO CHANGE in the phonology or 
orthography (e.g., enjoy and enjoyment ) , (b) pairs in which there is a PHONO- 
LOGICAL CHANGE but no orthographic change (e.g., major and majority ) , (c) 
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pairs in which there is an ORTHOGRAPHIC CHANGE but no phonological change 
(e.g., rely and reliable ) , and (d) pairs in which phonology and orthography 
BOTH CHANGE (e.g., reduce and reduction). 

In developing these tests to address the research questions, we 
anticipated two particular patterns of results. First, on the Question of the 
developmental trends of morphological knowledge and spelling ability, we 
expected performance on the dictated spelling test to lag behind performance 
on the test of oral generation (called the Test of Morphological Structure), 
since the development of morphological knowledge most likely precedes the 
ability to use this knowledge in spelling. Second, on the question of the 
ruleful nature of learning the morphology and spelling cf derived words, we 
expected that the words undergoing phonological and both phonological and 
orthographic changes would present more difficulty than the words with more 
transparent relationships (those undergoing no change or just orthographic 
change). This expectation was based on the finding of various research stud- 
ies that the more remote the relationship between base and derived forms, tlie 
more difficult it is to learn the relationship rulefully (Berko, 1958; Derw- 
ing, 1976; Derwing & Baker, 1979: Moskowitz, 1973; Templeton, 1980). 

A final consideration reflected the nature of orthographic rules and the 
derived words whose spelling is governed by these rules. The spelling of such 
words draws on a somewhat different kind of "ruleful" learning — the c ven- 
tions of our spelling system. While a knowledge of the morphological compo- 
nents of words such as "sunny" would make the task of spelling easier, specif- 
ic knowledge of the conventions of spelling words with suffixos (such as the 
rules governing the doubling of consonants) would also seem to be helpful, if 
not necessary. An exploratory study of the mastery of suffix addition rules 
between the seventh and ninth grades showed that words with suffixes made up 
more than half the errors in the students' compositions (Carlisle, 198*0. 
Therefore, a test was deviled that would help determine whether the students 
were able to apply the suffix addition rules consciously. Since the ortho- 
graphic changes could be memorized as word-specific spellings, this test re- 
quired the addition of suffixes to nonsense words. On the premise that mas- 
tery of the suffix addition rules is dependent on knowledge of morphemic 
structure and knowledge of abstract generalizations, the students' ability to 
apply these spelling rules was expected to develop later than their knowl .ge 
of morphological structure. 

Method 

Subjects 

The subjects were 65 students from the fourth, sixth, and eighth grades 
of a rural Connecticut school system. The 22 sixth graders and the 21 eighth 
graders came from classes studying language arts and literature. These were 
selected by the teachers on the basis of class size and availability of time. 
The fourth-grade group was made up of 22 students from two elementary class- 
rooms. All subjects were judged by their teachers as having at least average 
intelligence. 

Procedures 

The Spelling subtest of the Wide Range Achievement Test (WRAT) was admin- 
istered to each grade level group (Jastak & Jattak, 1978). Within a week the 
Derived Forms subtest of the Spelling Test was alministered to each grade-lev- 
el group. One week later the Base Forms subtest of the Spelling Test and the 
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Test of Suffix Addition were administered. Two weeks after the administration 
of the Derived Forms subtest of the Spelling Test, the Test of Morphological 
Structure was administered to each student individually. 

Materials 

1 . Wide Range Achievement Test, Spelling Subtest (Jastak & Jastak, 
1978) , This dictated spelling test was admiristered to determine the spelling 
capabilities of the subjects and to evaluate the validity of the experimental 
spelling tests. Level I was administered to the fourth-grade group, and Level 
II to the sixth- and eighth-grade groups. Level II, the appropriate form to 
use with youngsters aged 12.0 and over, was given to all sixth graders even 
though some of them were not yet 12 in order to permit group administration of 
the test and to insure an accurate comparison of spelling abilities within the 
sixth-grade group. The test was administered in accordance with the direc- 
tions for group administration. The students' performance on the Wide Range 
Achievement Test (WRAT) Spelling subtest yielded the following grade-equiva- 
lent scores: fourth grade 5.9 (standard deviation of 1.0): sixth grade, 6.7 
(standard deviation of 1.4); and eighth grade, 9.4 (standard deviation of 
1.3). The correlation between the students' performances on the WRAT Spelling 
subtest and on the Derived Forms subtest of the Spelling Test (described here- 
after) was .64 (£ < .001). 

2. Test of Morphological Structure . This experimental test was designed 
to assess knowledge of derivational morphology. The test has two subtests. 
For the Derived Forms subtest, the student's task was to state a specific de- 
rived form, once the examiner had given the base word and a sentence that 
needed the derived form as the final word to complete the sentence. (The 

first item on this subtest was: "Warm. He chose the Jacket for its ." The 

target response was "warmth.") For the Base Forms subtest, the student's task 
was to state the base form, once the examiner had given the derived form and 
an appropriate sentence, designed to end with the base form. (The first item 

on this subtest: "Growth. She wanted her plant to ." The target response 

was "grow.") 

The words on the test (see the Appendix) are based on four types of 
linguistic relationship between the base word and derived form: 

a. NO CHANGE— Neither the phonology nor the orthography of the base 
changes in the derived form (e.g., enjoy and enjoyment ). 

b. ORTHOGRAPHIC CHANGE— The spelling but not the phonology of the base 
word changes in the derived form. Three types of changes were included in the 
word list: the doubling of a final consonant before the suffix (i.e., sun to 
sunny ), the transformation of the j to i (e,g., re ly to reliable ) , and the 
omission of a final e before a suffix beginning with a vowel (e.g., endure to 
endurance ) . 

c. PHONOLOGICAL CHANGE— The pronunciation changes in the shift from the 
base word to the derived form without an accompanying change in spelling. 
Four kinds of phonological change were included: (a) tense to lax vowel 
(e.g., heal to health ) , (b) vowel reduction (e.g., original and originality ) , 

(c) shift in the pronunciation of a consonant (e.g., magic and magician ) , and 

(d) shifts in both a vowel and a consonant pronunciation (e.g., sign and sig - 
nal ). 
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d. BOTH CHANGE— Changes in both the orthography and t^e phonology occur 
in the shift from base word to derived form. Among the words of this group 
are representatives of different types o** phonological shifts, including vowel 
shifts, consonant shifts, and shifts in the pronunciation of both vowel and 
consonant. Examples of BOTH CHANGE word pairs are deep and depth , decide and 
decision , and reduce and reduction . 

Words for the two subtests, Derived Forms and Base Forms, were selected 
to be as similar as possible in length, frequency, affixation, and similarity 
in meaning of the root word and its drrived form. First, words on the two 
subtests, type by type, do not differ in word length, as determined by number 
of letters. The average length for the base words is 5.6 letters for Derived 
Forms ana 5.7 letters for Base Forms; the derived forms of both subtests aver- 
age 8.5 letters. 

Second, an effort was made to ensure the familiarity of the words for 
students in grades four through eight. As a measure of the familiarity of the 
written forms, only words with a Standard Frequency Index rating of 40 or 
above were used (Carroll, Davies, & Richman, 1971). (A Standard Frequency In- 
dex of 40 indicates a word that has an estimated frequency of one in a million 
words.) The words were equated for frequency by word type (NO CHANGE, ORTHO- 
GRAPHIC CHANGE, and so on) on the two subtests, Base Forms and Derived Forms. 
The mean frequencies are as follows: for the base words, 55.1 (SD 1.8) on the 
Base Forms subtest and 55.2 (SD 0.9) on the Derived Forms subtest; for the de- 
rived words 49.6 (SD 2.4) on the Base Forms subtest and 50.6 (SD 1.8) on the 
Derived Forms subtest. 

Third, attempts were made to control for semantic distance (i.e., the 
similarity of the meanings of base and derived forms), semantic variations, 
and syntactic options, all factors that can affect the difficulty of generat- 
ing morphological forms. An effort was made to select base and derived forms 
with familiar and similar meanings, ^he sentences were written in such a way 
as to constrain possible choices in meaning and form. Pilot testing was used 
to eliminate items that did not meet these criteria. 

The order of items on each subtest was determined by creating ten sets of 
four items, each set made up of one word of each word type (NO CHANGE, ORTHO- 
GRAPHIC CHANGE, PHONOLOGICAL CHANGE, and BOTH CHANGE). The four word types 
were randomly ordered wiohin each set, and the ten sets were randomly ordered 
on the test. 

The test was administered by means of a tape-recording in standard En- 
glish spoken by a native American male speaker. Directions and practice items 
were given by the examiner. The directions indicated that the student was to 
give the form of the word that correctly completed the sentence. Three prac- 
tice items were given to all the students; the first, for example, was: 
"Farm. My uncle is a . n The correct response was farmer . 

If the student completed the first practice item incorrectly, the correct 
answer was provided. The item was then repeated so that the student could 
give the correct answer. Once the tape was started, the administration con- 
tinued without further assistance. If a student gave no response to a test 
item in the allotted time (5 s between the end of one item and the beginning 
of the next), the tape was stopped, and the student was asked if *ie or she 
could give a form of the word that completed the sentence. After this answer 
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was recorded, the student was asked to try to give a prompt response to each 
item and was reminded that extra time would not be given for other items, 

3. The Spelling Test , Thi3 dictated spelling test was used to determine 
whether the students could spell the same base words and their derived forms 
that make up the Derived Forms subtest of the Test of Morphological Structure. 
The first subtest (Derived Forms) consists of the 40 derived forms, given by 
dictation. The second part (Base Forms), consists of the 40 base words, also 
presented by dictation. The words appear in random order in each subtest. 
The Derived Forms subtest was administered a week before the Base Forms sub- 
test so that the subjects would not be sensitized to the relationship between 
the root and derived forms in spelling the derived forms. 

The test was administered by means of a tape-recording in standard En- 
glish spoken by a native American male speaker. Each word was presented first 
alone, then in a sentence, and finally alone. There was a 10-s lapse between 
the ^at pronunciation of the spelling word and the start of the next item. 
The directions and two sample items were given by the exaniner orally. The 
directions explained the nature of the test and the student's task, including 
giving the procedure for writing the words. The students were told that they 
could not pick up their pencils to write the dictated word until the test- item 
had been completed. The students were directed "To listen carefully to each 
word and the way it is used in the sentence." The same dL .ctions were used 
for the two subtests. 

4. Test of Suffix Addition . This test was designed to determine the ex- 
tent to which students were able to apply the rules that govern the addition 
of suffixes to base words. Nonsense words were used as the base words so that 
the correct execution of this task could not be accomplished on the basis of 
familiarity. The test consists of 30 nonsense words, each followed by an 
addition sign (+), a suffix, an equal sign and a blank line. (The first item, 

for example, is "dun + er - .") The nonsense words were constructed from 

real words by substituting one consonant for another ( dun for run ) or one con- 
sonant blend for a consonant or another consonant blend ( drlm for swim or prad 
for sad). In no case was the substituted consonant the final consonant in the 
original word. 

Each item on the Te^t of Suffix Addition requires the use of one of the 
three suffix rules that form the basis of the ORTHOGRAPHIC CHANGE word type on 
the Test of Morphological Structure. There are ten items for each spelling 
rule — the rule governing the doubling of a final consonant (c< Ae6 the "dou- 
bling" rule), the final 2! rule c r*d the final rule. In addition, since 
sometimes no change is made in the base word when the suffix is added, about 
half the words required an orthographic change for correct suffix addition and 
the other half did not. The test items assess the knowledge of some fairly 
refined aspects of the conventions for suffix addition. For example, "leace + 

able » " requires the knowledge that the e must be retained to indicate the 

"soft" sound of the c in leace (i.e., leaceable ) . Such items were included to 
probe the breadth of the students' knowledge of the rules that govern suffix 
addition. 

Directions for this test were given aloud by the examiner, and two exam- 
ples were completed on the blackboard to illustrate what was expected of the 
students. The directions indicated that the base words were not real words, 
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but that the students were to put the two parts (base and suffix) together as 
if they were real English words. 

The student wrote each w ord on a long blank line following the test item. 
There was no time limit. Mo3t students took approximately 10 minutes to com- 
plete the test. 

Results 

Developmental Trends in the Learning of Derivational Morphology 

Performances on the Test of Morphological Structure (TMS) were inicially 
scored by tabulating the number of correct responses for eacn subtest, Base 
Forms and Derived Forms. The mean scores for eacn subtest of the TMS, given 
in Table 1, show that there was an increase in the knowledge of derivational 
relationships by grade level. 



Table 1 

Mean number correct (and SDs) on the Test of Morphological Structure, the 
Spelling Test, and the Test of Suffix Addition by grade level 



Test of Morphological Spelling test Test of Suffix 

Structure Addition 

Base Derived Base Derived 

Grade forms forms forms forms 



14 


30.8 


26.3 




(6.9) 


(5.H) 


6 


35.2 


31 .9 




(4.1) 


(3.8) 


8 


39.1 


35.7 




(0.7) 


(2.H) 



Note: Maximum score for TMS and S 
Maximum score for TSA - 30. 



214.9 


114.6 


16.0 


(9.3) 


(9.8) 


(1.0) 


3»4.2 


26.0 


17.9 


(H.1) 


(7.5) 


(3.3) 


38.2 


314.14 


21 .0 


(3.0) 


(5.3) 


(3.7) 


- 140. 







The performances at the three grade levels were found to be significantly 
different for both the Base Forms, F(2,62) - 18.99, £ < .001, and the Derived 
Forms, F(2,62) - ^6.37, £ < .001. Paired comparisons (Scheff3, £ ' .05) 
showed that for both the Base Forms and Derived Forms subtests the fourth 
grade was significantly different from the sixth grade and from the eighth 
grade, and the sixth grade was significantly different from the eighth grade. 
In the eighth grade the students' performance on both subtests was close to 
the ceiling level of the test. 
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Differences in the students' ability to generate the word forms on the 
TMS cannot be attributed to differences in word frequency or word length. The 
coronations between errors on the TMS and word frequency (taken from the 
norms of Carrol] et al. f 1971) were very low for the Base Forms, r - J4, £ - 
.20, and for the Derived Forms, r ■ -.08, p « .43. Correlations between word 
len^h and urrors were also very low both for Base Forms, r ■ -.26, £ - .02, 
and for Derived Forms, r « .01 , £ - .90. 

The Spelling Test (ST) was subjected to a similar analysis. The stu- 
dents 1 performances were scored on the basis of the numoer of words spelled 
correctly on each subtest, Base Forms and Derived Forms. Letters incorrectly 
or ambiguously formed were counted wrong. Where the legibility of a letter or 
word W6 * questionable, one additional judge scored the word independently. 
This procedure effectively removed the few instances of uncertainty. 

The increase in mean number f crrect spellings on the ST, as shown in 
Table 1, is significant for both the Base Forms, F(2,b2) - 26 . 69, £ < .001, 
and the Derived Forms, F(2,62) - 3^.88 f £ < .001. Paired comparisons of the 
gi-oup means (Schef^, £ < .05) indicate that on the Base Forms subtest the 
fourth graders liifered significantly from the eighth graders, arid the sixth 
graders differed significantly from the eighth graders. On the spelling of 
the Derived Forms t^ia fourth graders differed significantly from the si; th and 
eighth graders, but the sixth graders were not significantly differenc from 
the eighth graders. The eighth graders 1 spelling of the ba3e forms was prac- 
tically at a ceiling level, although their spelling of the derived v ds was 
somewhat less proficient. 

As woulvl be expected from other investigations of spelling skills (see 
Cahen, Craun, & Johnson, 1971), the correlation between word length and spel- 
ling errors and the correlation between v»w* J frequency and spelling errors 
were low to moderate. For both the base words and th' derived words, the 
correlation or word length with spelling errors was .49 (£<.01>. The correc- 
tion of the frequency of base words with errors on base words was -.34, and 
the correlation of frequency of derived forms with errors on derived forms was 
-.37 (£<.05). 

Developmental trends based on the relative difficulty of the TMS and ST 
su*tedts were also found. On both tests, the perfcrmances on the Base Forms 
subtests were significantly better than the performances on the Derived Forms 
subtests: for the ST, t(64) - 13.23, £ < .001; for the TMS, t(64) - 3-90, £ < 
.001. The superior performance on the Base Forms subtests suggests that the 
ability to extract the base word from its d'T'ived form is developed before the 
ability tc generate the derived form from tiie base form. Similarly, the spel- 
ling of the base words appears to be mastered before the spelling ->f their de- 
rived counterparts. 

Relationship Between Knowledge of Derivationa l Morphology and Spelling Ability 

The second research question concerned the relationship between learning 
derivational morphology and learning to spell derived words. In order to de- 
termine ths extent to which performance on the Base Forms and Derived Forms 
subtests of the TMS and ST accounted for variance in the performance at the 
three grade levels, a discriminant functi n analysis was carried out. This 
analysis generated one function that accounted for 94.8$ of tie variance 
(Kilks 1 Lambda 0.3680109 at a significance level 0.0000). (A s." cond func- 
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tion was nob significant.) The standardized canonical coefficients of this 
function are as folljws: TMS Derived Fo^ms u.82l7j; TMS, Base Forms -0.59577; 
ST, Derived Forms 0.89^84; S'\ Base Forms 0.01003. The particularly high 
loadings of this function are on the Derived Forms subtests of the TMS and ST, 
suggesting that knowledge of derived forms more strongly distinguished the 
three grade levels than knowledge of base forms. Thi3 function correctly 
predicted the grade level of 69.23$ of the group. 

A second method was ased to investigate the sensitivity to morphological 
structure in spelling derived words. The students' spelling of each word 
pair, the base form and its derived counterpart, was tabulated- Performance 
on each pair was figured according to four possible patterns: both base and 
derived forms incorrect (e.g., equl and equity ), base correct but derived in- 
correct (e.g., begin but begglner for beg Inner ) , base incorrect but derived 
correct (e.g., expens for expense but expensive ) , and both base and derived 
correct (e.g., explain and explanation ), 



100 
90 - 
80 - 
70 
60 
50 
4C 
30 
20 
10 




1 



■ 4TH GRADE 

□ 6TH GRADE 

□ 8TH GRADE 



BASE AND 
DERIVED 
INCORRECT 



ONLY BASE 
CORRECT 




ONLY DERIVED 
CORRECT 



BASE AND 
DERIVED 
CORRECT 



Figure 1. Comparison of correct at.d incorvect spellings of word pairs, base 
and derived forms, by grade level. 



The results of this analysis, snown in Figure 1, give performance on 
pairs of words as a percentage of the total possible. One-wcy analysis of 
variance showed that the instances in which the students were able to spell 
both the base and derived words correctly increased significantly by grade 
level, F(2,62) - 3*4.51, r ' .001. Paired comparisons of the group means 
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(Scheff£, £ < .05) indicate that the fourth grade was significantly different 
from the sixth and eighth grades and that the sixth grade was significantly 
different from the eighth grade. 

Furthermore, as is evident from an examination of Figure 1 , generally 
speaking, correct spelling of the base word is a precondition for correct 
spelling of the derived word. While fourth- and sixth-grade students quite 
commonly misspelled the derived word but spelled the base word correctly, the 
reverse pattern was extrerely uncommon — these students very seldom misspelled 
the base word and yet spelled the derived word correctly. The instances in 
which the base word was correct but the derived form was incorrect diminish 
markedly by the eighth grade — an indication of rapid learning of the spelling 
of derived forms by this srade level. 

Performance on TMS and ST as a Reflection of Word Type 

The third research question concerned the ruleful learning of derivation- 
al morphology and the extent to which such knowledge appears to be used in 
spelling derived words. To investigate this question, the experimental tests 
included four types of word relationships reflecting the kinds of transforma- 
tions commonly found between base and derived words. As described earlier, 
these word types are No Change (NC), Orthographic Change (0C), Phonological 
change (PC) -yid Both Change (BC). The premise was that the more complex re- 
lationships, involving mastery of phonological and orthographic rules, would 
generate more errors than the more transparent relationships and would be mas- 
tered somewhat later. For the TMS, performances on both the Base Forms and 
Derived Forms subtests showed a pattern of performance by word type, in gener- 
al reflecting more difficulty with the relationships that required phonologi- 
cal and/or both orthographic and phonological changes, as can be see in Figure 



In order to determine the extent to which the four word types (No Change, 
Orthographic Change, and so on) of the two TMS subtests (Base Forms and De- 
rived Forms) accounted for variance in the performance at the three grade lev- 
els, a discriminant function analysis was carried out. This analysis generat- 
ed one function that accounted for 89.23J of the variance (Wilks 1 Lambda 
O.M20459 at a significance level of 0.0001 ). (The second function was not 
significant.) The standardized canonical coefficients of this function, shown 
in Table 2, indicate that the highest loading is on the Phonological Change 
word type of the ease Forms subtest, with moderate loadings on most of the re- 
maining word types (the exceptions being the No Change and Orthographic Change 
word types of the Base Forms test) . This function correctly predicted the 
grade lev il of 7 3 »85t of \ the group. 

As on the TMS, the students 1 spelling performance of the Derived Forms 
subtest of th^ Si was analyzed oy word type. The question is whether stu- 
dents* success in spelling derived words is a reflection of the type of 
transformation between the base and derived form. (The words on the Base 
Forms subtest cannot be analyzed in the same way, since the dictated word is a 
single base morpheme, and there was noth*ng in the task to encourage the 
speller to consider morphological relationships.) An examination of the spel- 
ling errors on the four word types of the Derived Forms subtest of the ST 
(shown in Figure 2) indicated that the mean number of errors differed signif- 
icantly by grade level: for NC, F(2,62) - 24.30, £ < .001; for OC, F(2,62) - 
19.36, £ < .001; for PC, F(2,62) - 28.30, £ < .001? for BC, F(2,62) - 50. U7, £ 
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Table 2 

Standardized canonical discriminant function coefficients for the word types 
on the Base Forms and Derived Forms subtests, TMS 



Base Forms No Change CK0H983 

Orthographic Change 0,05298 

Phonological Change 0.63323 

Both Change 0,38875 

Derived Forms No Change O.H0498 

Orthographic Change -0,26030 

Phonological Change -0,19539 

Both Change 0,20406 
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MEAN ERROR8 BY GRADE LEVEL 
ON WORD TYPES OF THREE EXPERIMENTAL 1 ASKS 
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BC BOTH CHANGE 
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ORAL GENERATION 
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i 



NC OC PC BC 
ORAL GENERATION 
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ll 



NC OC PC BC 
SPELLING- 
DERIVED FORMS 



Figure 2. Mean errors by grade level on word types of three experimental 
tasks. 
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< .001. Paired comparisons (Scheff4, £ < .05) indicated that for each word 
type the fourth grade differed significantly from the sixth and eighth grades, 
and the sixth grade differed significantly from the eighth grade. 

The two Derived Forms subtests most directly assess knowledge of 
transformations between base and derived forms. Consequently, the two Derived 
Fo^ms Subtests, TMS and ST, were analyzed in order to determine the extent to 
which the four word types (No Change, Orthographic Change, and so on) on the 
two Derived Forms subtests accounted for variance in performance at the three 
grade levels. A discriminant function analysis generated one function that 
accounted for 93.10* of the variance (Wilks 1 Lambda 0.2867654 at a signifi- 
cance level of 0.0000). (A second function was not significant.) The stan- 
dardized canonical coefficients of this function, shown in Table 3, indicated 
high loadings on the Phonological Change word type of the TMS and the Both 
Change word type of the ST, suggesting that these were particularly important 
in accounting for the differences in performance by grade level. Both draw on 
knowledge of phonological rules, whether for generating or spelling derived 
forms. This function correctly predicted the grade level of 76.92% of the 
group. 



Table 3 

Standardized canonical discriminant function coefficients for the word types 
on Derived Forms subtests, TMS and ST 



No Change 




-0.07509 


Orthographic 


Change 


0.09502 


Phonological Change 


0.43693 


Both Change 




0.04723 


No Change 




-0 . 25244 


Orthographic 


Change 


-0.14684 


Phonological 


Change 


0.17305 


Both Change 




0.92650 



Analysis of Types of Errors on the TMS 

Analysis of errors on the TMS provided further insight into the mastery 
of the rulefulness of derivational morphology. The decision to analyze the 
types cf errors on t^e TMS (Derived Forms subtest) arose from the observation 
of patterns among the jents 1 incorrect responses. The errors fell natural- 
ly into four categories: BASE ONLY for no response other than repetition of 
the base word (e.g., sign for sign ) ; RULEFUL for ruleful but nonexistent words 
(e.g., rev is erne nt for revision); UNUSUAL for unusual but podsible answera 
(e.g., healing instead of health in response to the item, ^Heai. His sister 

was worried about his ."); and INAPPROPRIATE for nonruleful, nonexistent 

words (e.g., consumeratlon for consumption ) or for existing words that were 
inappropriate answers (e.g., glorify instead of glorious in response to the 
item, "Glory. The view from the hill top was ."77 Table 4 shows the aver- 
age number of errors in each category made at the thr*e <£rade levels. 
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Table 4 

Mean errors (and SDs) on error types of the Derived Forms subtest, Test of 
Morphological Structure, by grade level 



Grade 


Base Only 


Ruleful 




4.7 


2.5 




(3.3) 


(3.3) 


6 


3.4 


1 .0 




(2.9) 


(1 .2) 


8 


1 .0 


0.7 




(1.2) 


(0.8) 



Error types 




Unusual 


Inappropriate 


2.3 


3.5 


(1 .6) 


(4.4) 


1 .5 


1.8 


(1.2) 


(1.5) 


1.5 


0.8 


(1.1) 


(0.7) 



Analysis of variance showed that the errors in three of the categories 
decreased significantly by grade level — the BASE ONLY errors, F(2,62) - 10.66, 
2 < .001, the RULEFUL errors, F(2,62) - 4.46, p < .05, and the INAPPROPRIATE 
errors, F(2,62) - 0.03, £ < .01. The UNUSU errors were not significantly 
different by grade levol. Further examination *as made o' two of the errors 
types that seemed to be of particular interest — the RL'^FUL errors and the 
UNUSUAL errors. Ninety-one RULEFUL errors (17% of the total) were made 
altogether— 60. 4 J by fourth graders, 24.2 % by sixth graders, and 15.4? by 
eighth graders. Perhaps more revealing than the number of errors is the 
nature of the RULEFUL errors. Eighty-two percent of the errors were made on 
wor t that undergo a phonological change (with or without an accompanying 
orthographic change) in their derived forms (e.g., revise to revision ) . For 
97% of these errors, the version given preserved the phonological identity of 
the base word (e.g., revisement instead of the target word, revision ) . 

In addition, the students seemed to show a preference for certain 
suffixes in creating their RULEFUL errors. Most popular was -ment (accounting 
for 58J of the errors), followed by -ance, -tion , -ness , and -less . All of 
these suffixes were used to create a derived form without a phonological 
change in the base word. There is no reason to believe that tha students were 
biased toward t. i use of any particular suffix by the other words on the test, 
For instance, the ;nly test item with -ment as a suffix is the word enjoyment . 

Uniike the other error categories, the UNUSUAL errors did not diminish 
significantly between the fourth and eighth grades. Analysis of the words on 
vnieh such error** wpp* ma<ip ; a* well as the kinds of responses Riven, indicate 
that the UNUSUAL errors occurred with the presentation of specific based words 
and sentences. Most (80%) of these errors occurred in generating the derived 
words from the following base words: warm , deep , equal , active , consume , and 
heal. Four of these six undergo a phonological change from the base to the 
target derived form, and yet most of the responses retained the sound of the 
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base word (e.g., healing instead of the target word, health ). The responses 
are unusual in that they suit the sentence but were not anticipated as likely 
answers, given the structure of the sentence. In this respect, the UNUSUAL 
answers are acceptable if not ideal and may reflect an inability to associate 
the base and derived word forms. We cannot infer that the students did not 
know the words health or consumption . 

Performanc e on the Test of Suffix Add it ion 

The students 1 mastery of orthographic "rules" was examined by means of 
the Test of Suffix Addition (TSA). The students 1 scores on the TSA consisted 
of the number of correct responses. In several instances, responses were 
written with a letter omitted, substituted, or placed in the wrong order in a 
part of the base word that was not essential to the suffixation. Such answers 
were not counted as incorrect if the suffix was correctly attached (e.g., 
beindlsh for blend ish ) . However, where the miscopying of a base word in any 
way affected the addition of a suffix or where the suffix itself was mis- 
spelled, the answer was counted as wrong (e.g., pluddlng for pludy ing ). 

The students 1 performance on the TSA, shown in Table 1, indicates 
improvement in the ability to add suffixes to nonsense words, following the 
-y, -e, and doubling rules. Tne scores also show that even at the 
eighth-grade level, the students have not fully mastered the suffix addition 
rules. The .ifference between grade levels was significant, F(2,62) - 10.25, 
2 < .001. Paired comparisons of the group means (Scheff£, £ < .05) show that 
the fourth and sixth grades were significantly different from the eighth 
grade, but that the fourth grade was not significantly different from the 
sixth grade. The more pronounced growth appears to take place between the 
sixth and eighth grades. 

Discussion 

This study set out to investigate the knowledge of derivational morpholo- 
gy at the fourth-, sixth-, and eighth-grade levels and to investigate the ex- 
tent to which this knowledge is reflected in the students' spelling of derived 
words. The results of the study have shown that students appear to learn a 
great deal about derivational morphology between the fourth and eighth grades. 
Their knowledge reflects varied levels of understanding of the -nderlying 
phonological rules and the orthographic rules that govern the trar rmations 
from base to derived forms. In addition, the\e are some indication: uhat stu- 
dents learn to spell derived forms by reference to morphemic structure. 
Still, the spelling of derived forms lags behind ihe knowledge of these forms. 
Even by the eighth grades, students do not have a full mastery of the more 
complex transformations between base and derived forms or of the suffix addi- 
tion rules. 

D evelopmental Trends in the Lea rning of Derivationa l Morphology 

Significant growth toward mastery was found on e?^h of the three tasks 
that assessed morphological knowlc J^c — the generation of base and derived 
forms and the spelling of the derived forms. The test results yield some 
indication of the order in whicf, different skills are acquired. First, the 
ability to extract base forms fron derived forms was mastered before the abil- 
ity to generate derived forms from base forms, Second, the ability to spell 
base words was mastered before the ability to spell their derived 
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counterparts. Third, the ability to produce the correct base and derived 
forms orally was generally superior at each grade level to the ability to 
spell the base and derived forms. Finally, application of the suffix addition 
rules was not fully mastered by the eighth grade, although the students' abil- 
ity to apply these rules improved significantly between the sixth and eighth 
grade. 

The task of extracting the base word (given the derived form and an ap- 
propriate sentence context) requires the ability to analyze morphemic struc- 
ture, while the task of generating the correct derived form (given the base 
form and an appropriate sentence context) involves an awareness of the syntac- 
tic and semantic form suitable for a particular sentence context. This aware- 
ness, in turn, depends on a knowledge of the available and acceptable forms of 
a given word (such as equality instead of equalness ). The students differed 
significantly in their proficiency on these two tasks — the Base Forms and De- 
rived Forms subtests of the Test of Morphological Structure (TMS). It is 
evidently easier to analyze the morphemic structure of derived forms than it 
is to produce an appropriate derived form. While the two tasks differ in dif- 
ficulty, the mean score3 on both subtests increase significantly by grade lev- 
el. Improvement on the Derived Forms subtest was particularly dramatic, as 
the fourth graders had a mean score of 26.3 correct (the maximum possible be- 
ing 40), while the eighth graders had a mean score of 35*7 correct (see Table 
1). The eighth graders approached the ceiling level on both subtests of the 
TMS, which gives an indication of the point at which students become competent 
at analyzing the morphemic structure of derived words and knowing the proper 
word forms, given words of chls level of difficulty. 

Since the words on the two subtests were chosen to be equally familiar, 
we can surmise that the particular source of difficulty in learning deriva- 
tional morphology is less learning to analyze morphemic structure of derived 
words than learning appropriate derived word forms. One important aspect of 
this contrast may be the different demands each of the tasks makes on an 
individual. It is likely that production of a word form is more taxing than 
analysis of the structure of a given word. However, 'nis general observation 
needs to be examined in regard to individual differences in performance. For 
individuals who have trouble understanding the morphemic structure of words 
(Wiig, Semel, & Grouse, 1973) » the two tasks might be equally challenging. 

Learning to Spell Derived Forms 

Comparisons of the students' performances on the TMS and the Spelling 
Test (ST) confirm our expectation that spelling is a more difficult task than 
orally generating word forms. It is not surprising that skill in spelling de- 
rived forms appears to develop later than skill in generating derived forms. 

Performances on the ST suggest that spelling derived forms draws on a 
knowledge of morphological relationships. When spelling performances on each 
word pair (the base and its derived form) ware analyzed, mastery of the spel- 
ling of derived forms seemed to depend on initial learning of the spelling o*' 
the base fornix As Figure 1 shows, students very seldom spelled a derived 
form correctly when they spelled the base form incorrectly, whereas they quite 
commonly misspelled the derived form when they had spelled the base form 
correctly. It is unlikely that this pattern would be ?o pronounced if derived 
words were learned as unanalyzed whole words. In addition, there is a rela- 
tively small decrease in the percentage of the instances in which the base is 
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correct but the derived form is incorrect (29% to 11 %); in contrast, there is 
a very large increase in the percentage of instances in which both the base 
and derived forms are spelled correctly (33? to 8751). This suggests a rapid 
improvement in the ability to manage the "derived" part of t*e derived forms 
(including orthographic and phonological transformations), along with an 
improvement in the ability to spell the base forms. 

Another indication of the use of morphemic analysis in spelling of de- 
rived forms canes from the students 1 performance on the Test of Suffix Addi- 
tion. Since this test involves adding suffixes correctly to nonsense words, 
it requires explicit knowledge of specific suffix conventions (those governing 
the addition of suffixes to words ending in a silent e 9 in and in a single 
consonant). These suffix rules, which differ from the linguistic rules that 
govern phonological and orthographic transformations, are appropriately viewed 
as conventions of writing that govern the correct spelling of both inflected 
and derived forms. They are most likely learned by observation of the pat- 
terns of suffix addition in the orthography or by direct instruction in 
school. (The alternative would be memorization of the sequence of letters 
used to spell each derived word, an unwieldy system, given the large number of 
derived words the students are learning and can use in their writing.) The 
students 1 performances on this test show an improvement in the mastery of the 
three suffix rules, particularly between the sixth and eighth grades. (The 
fourth graders 1 performance was not significantly different from that of the 
sixth graders.) Still, even the eighth graders had not mastered the rules 
completely. As was anticipated, the learning of these suffix addition rules 
seems to take place later than the mastery of the morphological structure of 
words. 

Ruleful Learning of Derivational Morphology 

The words on the TMS represent four types of transformations between base 
and derived word forms. These are: NO CHANGS in the or ,hography and phonolo- 
gy (e.g., enjoy to e njoyment ), ORTHOGRAPHIC CHANGE only (e.g., rely to reli- 
able), PHONOLOGICAL CHANGE only (e.g., major to majority ), and BOTH CHANGE, 
the orthography and the phonology (e.g., reduce to reduction ). The NO CHANGE 
word type represents the most transparent relationship, while the BOTH CHANGE 
word type represents the most obscure relationship. Analysis of the test re- 
sults suggests that the nature of the transformation between base and derived 
forms affected the accessibility of knowledge of morphological relatedness, as 
was expected (see Figure 2). On the Base Forms part of the TMS the NO CHANGE 
words had the fewest errors and the BOTH CHANGE words had the most, while 
PHONOLOGICAL CHANGE words and ORTHOGRAPHIC CHANGE words fall between these two 
extremes. On the Derived Forms subtest the PHONOLOGICAL CHANGE and BOTH 
CHANGE words gave much more difficulty than the NO CHANGE and ORTHOGRAPHIC 
CHANGE words. 

A discriminant function analysis of the four word types on the two sub- 
tests of the TMS (Base Forms and Derived Forms) yielded one significant func- 
tion that accounted for over 89% of the variance, indicating the power of 
these variables in distinguishing the student3 at the three grade levels. 
Contributing to the power of this function were all of the word types except 
the NO CHANGE and ORTHOGRAPHIC CHANGE types on the Base Forms subtest, possi- 
bly indicating that general mastery of the system of transformations distin- 
guished the three grade levels. 



1« 

° 17') 
ERIC 1/J 



Carlisle: Relationship Between Derivational Morphology and Spelling Ability 



ERIC 



Performance by word type was considered a particularly important Indica- 
tion of ruleful learning on the two Derived Forms subtests— the oral generr- 
tion task (Derived Forms subtest of the TMS) and spelling (Derived Forms sub 
test of the ST). The four won types on these subtests were included in a 
discriminant function analysis. This analysis yielded one significant func- 
tion that accounted for 93% of the variance. Of the standardized canonical 
coefficients, the heaviest loading was on the BOTH CHANGE word types of the 
ST, the second strongest contributor being the PHONOLOGICAL CHANGE word type 
of the TMS. These results suggest that mastery of both phonological and 
orthographic rules in spelling most strongly distinguishes the grade levels. 
Knowledge of the underlying phonological rule system also discriminates the 
three grade levels In performance on derived words, whether the task be oral 
generation or spelling. 

Despite these findings, performance on the Derived Forms subtest of th 
Spelling Test shows that the distribution of errors by word type is relatively 
even, a pattern evident at all three grade levels (see Figure 2). There are 
several possible reasons for the modest effect by word type In spelling. 
First, the two tasks (oral generation and dictated spelling) are very differ- 
ent In one Important respect. In generating derived forms, the student had no 
choice but to work with the morphemic structure of the word. However, in 
spelling the derived forms, the students were given the derived word by dicta- 
tion, and so the task did not require them to deal with the word's morphemic 
structure. The fact that there Is any consistency In the effects of word type 
by grade level suggests that some knowledge of orthographic and phono: ^ical 
transformations, at least, plays a role In the process of spelling derived 
forms. Second, spelling Is a complex skill, offering many opportunities for 
error. Clearly, the difficulty of spelling a derived word Is not simply a 
reflection of Its word type. 

Finally, analysis of the kinds of errors students made on the Derived 
Forms of the TMS gives additional support to the argument that the nature of 
the transformations between base and derived forms affects the ease of master- 
ing morphological relationships. The two error types selected for detailed 
analysis (the RULEFUL errors and the UNUSUAL errors) were found to fall 
primarily on those words that belonged to the PHONOLOGICAL CHANGE or BOTH 
CHANGE word types. The students 1 most common error was a form of the word 
that retained the sound of the base word, whethe- the response was an actual 
word or a ruleful Invention. This pattern sugges s that the younger students 
know something about the system of forming derivatives but have not yet 
learned all of the appropriate phono log leal changes. In fact, a large propor- 
tion of their errors showed a resistance to making phonological changes In 
giving derived forms. The students often simply added one of the more common 
and familiar suffixes ("all-purpose" suffixes such as -ment ) to the base word. 
For example, a number of students spontaneously invented the form producement, 
not knowing or not recognizing the morphological relationship of the correct 
response, production . 

The RL'EFUL and the UNUSUAL errors were In many respects quite similar; 
the UNUSUAL errors were differentiated primarily because they were existing 
English words, while the responses that made up the S.-LEFUL errors could De 
English words but for whatever reason are not-r. w i is the comjlexity of 
derivational morphology. Thus, even the students wno do not have a complete 
understanding of the complex transformations still understand some basic 
principles about how the system of derivational morphology works. 
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Instructional Implications 



This study provides some evidence that sixth- and eighth-grade students 
draw on their understanding of the morphemic structure of the words to guide 
their spellings of derived words. First, 'here was a strong relationship be- 
tween correct spelling of the base words and correct spelling of their derived 
counterparts. We might surmise that along with learning how to spell the base 
words, the students are acquiring morphological awareness — a sensitivity to 
word relationships and an inclination to use knowledge of morphological rela- 
tionships in spelling. Second, the students demonstrated improved ability to 
apply the orthographic rules that govern suffix addition, indicating that gen- 
eral principles are learned and applied to the spelling of derivatives. 
Nonetheless, the spelling of derived words lagged behind mastery of the system 
of the transformations between base and derived forms. The test results sug- 
gest that although a student may demonstrate an understanding of morphemic 
structure when asked to analyze words, he or she -ay not put this knowledge to 
use on a dictated spelling test of derived words, particularly where there are 
phonological and orthographic transformations. 

Since the students demonstrate some productive knowledge of derivational 
morphology, they have the potential, given suitable instruction, to develop an 
explicit awareness of the relationship between the word forms and their spel- 
lings. However, even the eighth graders still have not mastered the spelling 
of PHONOLOGICAL CHANGE and BOTH CHANGE derived words and the suffix addition 
rules. It seems likely that students in the fourth through eighth grades 
might benefit by spelling instruction that explicitly emphasizes morphological 
relationships and the principles that govern the addition of suffixes. One 
training study has been done that suggests the particular benefits of a 
morphemically-based spelling program (Robinson & Hesse, 1981). The sev- 
enth-grade students who received training in the morphemic s\ructure of words 
showed more improvement than a control group in general spelling performance 
and in specific performance on morphemically complex words. 

Poor spellers and learning-disabled students, who have been found to be 
deficient in their understanding of morphological rules (Wiig et al., 197 3) t 
might benefit particularly from intensive and explicit instruction in the 
morphemic structure of words. In a school system whose spelling program in- 
cludes instruction in morphemic analysis and spelling rules, bo f *i good and 
poor spellers showed gradual improvement in their spelling of words with 
suffixes between the seventh, eighth, and ninth grades (Carlisle, 198*0. How- 
ever, the poor spellers continued to lag well behind their peers. They seem 
to need more intensive instruction over a longer period of time to make sig- 
nificant improvement in their ability to spell words with suffixes. 

Explicit instruction in morphological relationships, including phonologi- 
cal and orthographic transformations, might enhance both the students 1 under- 
standing of the structure of the language and their ability to spell derived 
words. Such instruction could commence at the fourth-grade level. 
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Appendix 
Test of Morphological Structure 



No 

Change 



Orthographic 
Change 



Phonological 
Change 



Both 
Change 



Derived Forms 


Subtest : 


Base Forms Subtest: 


Given 


( Target Respon se) 


Given 


(Target Res 


1. 


warm 


(warmth) 


growth 


(grow) 


2. 


enjoy 


( enjoyment) 


employment 


(employ) 


3. 


appear 


(appearance) 


difference 


(differ) 


4. 


care 


( careful) 


fearful 


(fear) 


5. 


final 


( finally) 


usually 


(usual) 


6. 


profit 


(profitable) 


remarkable 


(remark) 


7. 


per form 


(performance) 


assistance 


(assist) 


o 

0. 


humor 


( humorous) 


dangerous 


(danger ^ 


9. 


nonest 


C honesty) 


royalty 


(royal) 


10. 


precise 


( precisely) 


extremely 


( extreme) 


1 . 


sun 


( sunny ) 


foggy 


(fog) 


2. 


swim 


( swimmer) 


runner 


(run) 


3. 


begin 


(beginner) 


propeller 


(propel) 


4. 


endure 


( endurance) 


guidance 


(guide) 


5. 


act ive 


(activity) 


density 


( dense) 


6. 


adventure 


(adventurous) 


cont inuous 


(continue) 


7. 


expense 


( ex pensive) 


sensitive 


( sense) 


8. 


happy 


(happiness) 


emptiness 


(empty) 


9. 


glory 


(glorious) 


furious 


(fury) 


10. 


rely 


( reliab le) 


variable 


(vary) 


1 . 


equal 


( equality) 


humanity 


(human) 


2. 


original 


(originality) 


personality 


(personal) 


3. 


drama 


( dramatic) 


periodic 


(period) 


4. 


magic 


(magician) 


musician 


(music) 


5. 


protect 


(protection) 


election 


(elect) 


6. 


express 


(expression) 


discussion 


(discuss) 


7. 


electric 


(electricity) 


publicity 


(public) 


8. 


sign 


( signal) 


national 


(nation) 


9. 


major 


(majority) 


popularity 


(popular) 


10. 


heal 


(health) 


cleanly 


( clean) 


1 . 


deep 


(depth) 


width 


(wide) 


2. 


type 


( typical) 


athletic 


(athlete) 


3. 


explain 


(explanation) 


combination 


(combine) 


4. 


produce 


(production) 


reduction 


(reduc x 


5. 


permit 


(permission) 


admission 


(admit) 


6. 


expand 


(expansion) 


extension 


(ex tend) 


7. 


absorb 


(absorption) 


description 


( describe) 


8. 


revise 


(revision) 


recognition 


(recognize) 


9. 


decide 


(decision) 


division 


(divide) 


10. 


consume 


( consumption) 


assumption 


(assume) 
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RELATIONS AMONG REGULAR AND IRREGULAR, MORPHOLOGICALLY-RELATED WORDS IN THE 
LEXICON AS REVJALED BY REPETITION PRIMING* 



Carol A. Fowler, t Shirley E. Napps,tt and Laurie B. Feldmanttt 



Abstract , Several experiments examined rppetition pricing among 
morphologically related words as a tool to study lexical organiza- 
tion. The first experiment replicated a finding by Stanners, 
N3iser, HernOi,, and Hall (1979) that whereas Inflected words prime 
their unaffixed morphological relatives as effectively as do the 
unaffixed forms themselves, derived words are effect bvt weaker, 
primes. The experiment aiso suggested, however that fh J fc ditfer- 
ence in priming may h-,e an episodic origin relating to the less 
formal similarity c f derived than of inflected words to unaffixed 
morphological relatives. A second experiment reduced episodic 
contributions to priming and found equally effective priming of 
unaffixed words by themselves, by inflected relatives, and by de- 
rived relatives. Two additional experiments found strong priming 
among relatives sharxng the spelling and pronunc tatior* of the 
una ,p ixec! stem morpheme, sharing spelling alone or sharing neither 
formal property exactly. Overall, results were similar with ?*idi- 
tory and visual presentations. Interpretations that repetition 
priming reflects either repeated access to a common lexical entry or 
associative semantic priming are both rejected in favor of a lexical 
organization in which components of a word (e.g., ^ ,°tem morpheme) 
may be shared among distinct words without the words t s emselve3, in 
any sense, sharing a "lexical ent^v." 

Words presented for lexical decision are more rapidly classified, and 
words presented under poor viewing or listening conditions are morv. readily 
reported, if they ha.e been presented previously in the experimental setting 
than if they have not (e.g., Forbach, Stanners. * Hcch^pu*. 197*1; Murrell & 
Morton, 197*J; Scarborough, Cortese, & Scarbo* "-gn, V.,77). We will refer to 
this general outcome as "repetition priniL. 1 Morton (e.g., 1981) and 
Stanners, Neisser, Hernon, and Hall (1979) have interpreted repetition priming 
a e a consequence of r2peated access to a lexical entry. Other research has 
identified both episodic (Feustel, Shiffrin, & Salasoo, 1983; Jacoby & Dallas, 
1981) and strategic (Forster & Davis, 198*1; Oliphant, 1983) componen s to the 
prim^g effect as well. 
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The lexical interpretation is of particular interest in light of patterns 
of priming that are observed among morphologically -related words. riming may 
occur in two forms that we refer to as "full" and ••partial." F*x± priming is 
priming of one word by another that is as large, statistically, as priming of 
a word by itself. Partial priming is priming of one word by another that is 
present, statistically, but is significantly less than priming of a word by 
itself. Generally, the findings are that priming of a base word by regularly 
inflected morphological relatives is full, while priming by derived forms is 
partial (Stanners et al., 1979). Priming by irregularly affixed words may be 
partial (Stanners et al., 1979) or absent (Kempley & Morton, 1982). 

Stanners et al. interpret full priming as evidence that stem forms and 
inflected relatives share a lexical entry; they interpret partial priming as 
evidence that stem forms and derived words are neighbors in the lexicon. This 
pattern of priming and its interpretation are appealing in supporting plausi- 
ble roles for lexical entries in language use. One role has repetition prim- 
ing as a by-product; a second role gives repetition priming its patterning. 

In Morton's theory of the lexicon (1969, 1 98l ), lexical entries are 
"logogens 11 which collect evidence for the occurrence in stimulation of the 
words they reprerent. Sufficient evidence, exceeding a logogen's thres' old, 
causae the logoger. to "fire." As one consequence of firing, the threshold is 
lowered temporarily so that leas evidence is necessary for firing if the word 
is presented a second time. The threshold rises very slowly over time. 
Thresholds of frequent words are kept permanently lowered by the frequent 
recurrence of the wo r ds in stimulation. The frequency-sensitive thresholds of 
logogens exlai^ /^petition priming, but more usefully for language users, 
they prepare language users for perception of words most likely to occur in 
the en/ironment. In this role, repetition priming is a by-product of the nor- 
mal operation of the logogen system. 

Arguably, this mechanism woul<* work well if, as the repetition priming 
da**a suggest, the lexicon counted a stem morpheme and its regularly inflected, 
bu. not derived, forms as the same word. Unaffirced words and their inflected 
relatives are the same part of speech with essentially the same core meaning; 
in a sense they are the same word with the difference between them determined 
by the grammatical context in which the word aopears. Consequently a common 
frequency-based expectancy is meaningful for classes of words iffering only 
in JnflectJonal affix. In contrast, unaffi\ed words and their derived rela- 
tives often are not the same part of speech, they need not be close in meaning 
(cf. Aronoff. 1976), and, consequ' ly, a common frequency-based expectancy 
for unaf fixed words and their derive relatives would not be meaningful. 

The second role for a lexical entry may be in providing appropriate input 
to regular and productive phonological rules of the language* In gen^ ative 
phonology (Chomsky & Halle, 1968), a lexical entry includes Just ^hat phono- 
logical information about a word that is not predictable by rule, and hence 
that uniquely identifies a word. The phonological rules that are most 
productive and regular in F Ush (and thus, perhaps, ';hat are most likely to 
be learned by language use . i cf .Berko, 1956; Ohala, 1974; Steinberg, 1973]) 
are rules of inflection. The finding that inflected words pr< ~ their stems 
xully, then, is consistent with a lexicon in which inflected words have no in- 
dependent representation. Certain speech errors (for example, morpheme shifts 
and strandings [Garrett, 1980a, 1980b]) have been interpreted as supporting a 
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similar conclusion; so have the speech patterns of some Broca's and jargon 
aphasics (see Butterworth, 1983, for a review of the relevant evidence). 

Despite the consistent and plausible view of the lexicon provided by 
repetition-priming findings, we decided to invastigate priming patterns furth- 
e* for two reasons. The first reason is that repetition effects are f 01 nd in 
the memory literature (e.g., Light & Carter-Sobell, 1970) in which they are 
ascribed to episodic, not to lexical memory, and, episodic sources of priming 
are found using paradigms very similar to the repetition priming paradigms 
themselves (Feustel et al., 1983). 

Moreover, it is not difficult to imagine how episodic influences might 
contribute to priming usir T the procedures of Morton or of Stanners et al. 
Subjects may explicitly recall having seen a word (or morphological relative) 
previously in the experiment, and in the procedure of Stanners et al., they 
may recall the response they made to it, This recollection may facilitate 
responding to a primed word. These episodic sources of priming are unlikely 
to exhaust the repetition priming that occurs (cf. Jacoby & Dallas, 1981); 
however, ad H ed to lexical sources of priming, they are likely to exaggerate 
the apparent loss in priming of an unaffixed form by a derived lorm as cm- 
pared to its priming by an inflected fori* or by itself. This exaggerac d 
difference would occur because derived forms generally are less formally or 
semantically similar to stem forms than are inflected forms (and> 3f course, 
than is a stem word to itself). Consequently, memory for a derived prime may 
)e less likely to be cued during later presentation of the unaffixed word than 
memory for an uninflected prime or for the target word itself. Accordingly, 
full priming between a word and itself or between a word and an inflected 
variant may include both lexical and episodic sources of priming, whereas par- 
tial prim'rg as between a derived prime and unaffixed target may include only 
lexical sources of priming. 

Cur second reason to explore further the patterning of repetition priming 
derives from questions raised about any repetition priming having a lexical 
rather than an episodic origin. The main question is whether repetitic i prim- 
ing originating in the lexicon always reflects repeated access to ? common 
lexical entry. In a recent review of the literature on word recognition, 
speech errors, and the speech and reading patterns of various language-dis- 
abled populations, Butterworth (1983) disputes the conclusion that lexical 
entries are common to unaffixed words and their affixed relatives in English. 
Instead, in his view, the bulk of evidence supports separate but associated 
entries for w ll words. If this interpretation is correct, then repetition 
priming may occur between separate entries in the lexicon. Our research in- 
vestigates che distinction between shared lexical entries for morphological 
relatives and associated, but separate entries. 

Our firs 4, 'xpeiiment was designed to test for episodic sources of influ- 
ence on repetition priming. Having found it, we take steps in late, experi- 
ments to reduce or eliminate it and to reexamine the pattern of repetition 
priming among stems and regularly and irregularly ' iMected and derived 
morphological relatives. This patterning suggests hypotheses concerning the 
organization of .norphologically related words in tie lexicon. 
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Experiment 1 



As an index of episodic priming, we chose to look at repetition priming 
on nonwords— both regular and irregular. The literature doss not offer a 
clear indication of whether nonword repetition prim'ng should he found using a 
lexical-decision paradigm. Forbach et al. (1<5 4) report essend?llv no 
repetition priming among nonwords; however, Scarborough et al. (1977) report 
some priming of this type. Stanners et al. (1979) do not report their find- 
ings on nonwords. 

In the present experiment, we examine repetition priming among wo. ds and 
nonwords under conditions replicating those in which Stanners et al. found 
full repetition priming of base forms by inflected morphological relatives and 
partial prining by derived forms. 



Subject s. Subjects were 25 Dartmouth College undergraduates who 
participated in the experiment for course credit. All were native speakers of 
English and had normal or corrected vision. 

Stimulus materials . The stimuli used in the experiment were 48 English 
words and 48 nonwords. The words formed two groups. One group (Inflections 
Only) was presented both without suffixes, called "base" stimuli, and with 
inflectional suffixes, -s" and "ed." The second group (Derivations and 
Inflections) appeared as base stimuli, with the inflectional suffixes "s" and 
"ed," and with two derivational suffixes ("ment" and one of "er w / ,l or 11 or 
"ableVible"). Thus, within the second group, the effects of inflectional 
and derivational forms of the same word can be compared with each other. 
Words ware chosen so that suffixation did not change the spelling or 
pronunciation of the base. 

Nonwords formed three groups. Items in the first group (Nonword, 
Inflections Only) were crated from r?al words having the same characteristics 
as the real words in the Inflections Only group. To form the nonwords, one or 
two letters in the real words were changed. The resulting nonwords were 
orthographically regular. These were presented both In a base form and with 
inflectional suffixes. Thus, they are the nonword counterparts of the first 
proup of real words. The second group (Irregular, Inflections Only), consist- 
ed of ten irregular four -letter constructions and these were also presented 
both as base forms and with inflectional suffixes, "s" and "ed." Irregular 
nonwords were included in the study to provide an index of episodic priming in 
nonwords presumed not to hrve any form of representation in the lexicon. The 
third group of nonwords ^.onword Derivations and Inflections) were analogous 
to the second group of real words. They were orthographically regular and 
were presented as f>ase forms, with inflectional suffixes, "s" and "ed," and 
with the derivational suffixes, "rent," "er"/"or" or "able"/"ible. 11 The words 
used in the experiment and the words from which the 38 regular nonwords were 
formed were equated on average length and on mean and median frequency (KuCera 
& Francis, 1967). Real-word base forms are listed in Appendix A. 

Five test orders were created, each one including the following priming 
conditions in equal numbers: (1) base as target with no prime (Lence/orth 
B1), (2) base as prime and base as target (BB; e.g., "manage "-"manage") , (3) 
inflection as prime and base as target (IB; e.g., I, manages"-"n.anage") , and (4) 
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derivation as prime and base as target (DB; e.g., "mana33ment"-"manage") . 
Across test orders, items appeared in identical serial positions, but the se- 
quences differed in which version of each item served a3 a prime. For exam- 
ple, for the base word "manage," the forms "manage," "manages," "managed," 
"management," and "manager" served as primes in the different test sequences. 
In all sequences, the target was "manage". For items occurring only js base 
forms and inflections in the experiment, each inflected form (i.e., "s," "ed") 
occurred in two test sequences and the base form in one as primes. Inflec- 
tions, derivations and base first occurrences *ere distributed proportionately 
over the five test sequences. 

Subjects saw each morpheme only twice: once as a prime and once as a 
target. The average lag between the occurrence of a prime and the occurrence 
of its target was nine intervening trials; lags ranged frcm 6 to 12 and each 
lag was equal 1 y frequent among words and nonwords. Filler items were used as 
necessary to maintain appropriate lags. Each subject completed five blocks of 
56 trials each, the first o r which was a block of practice trials. Primes and 
targets were presented within one block. 

Design. Five subjects were assigned to each of the five test orders. 
The independent variables were Priming Condition and Lexical Status (word, 
nonword). The main dependent variable was response time. 

Procedure . Subjects were run individually. The experiment was run on a 
time-sharing computer interfaced with a Polytronics response timer. The sti- 
muli were presented in upper case on a cathode ray *.* oe. On each triaJ the 
following sequence of events occurred: (1) a fixation string of plus signs 
(++++++++) came on; (2) the terminal bell sounded 500 ms before the fixation 
mark went off; (3) a letter-string appeared as soon as the fixation mark dis- 
appeared, and remained on until the subject responded; (4) once the subject 
responded and the stimulus disappeared, the fixation mark returned and another 
trial began. 

For each subject, the "K" key of the computer terminal was pressed with 
the right index finger for a word stimulus and the "D" key with the l^ft index 
finger for a nonword stimulus. The keys were labeled with the symbols "W" anr* 
"NW" for "word" and "nonword" respectively. Subjects were informed that b^tn 
accuracy and speed of responding were important, and that accuracy should be 
kept above 90% correct on each block of trials. 

Between blocks of trials subjects were informed of their mean reaction 
times and proportions correct for the preceding block of trials. Blocks were 
initiated by the subject. 



Errors and extreme reaction times (greater than 2000 ms or more than 2.5 
standard deviations from the individual subject's or item's mean) were exclud 
ed frcm the analysis. This procedure excluded less than one percent of the 
responses. When a subject responded incorrectly to one member of a prime-tar- 
get pair, both responses were excluded frcm the analyses. Table 1 presents 
mean response times and errors to base targets. 
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In all experiments, error rates will be reported in the appropriate 
tables. Analyses on the error rates will be reported only if they are signif- 
icant. 



Table 1 



Mean Reaction Times for Words and Nonwords for the Various Prime-Target Condi- 
tions of Experiment 1 



B1 BB 13 DB 
Words 

Inflections only 602 516( .10) 513M2) 

Derivations and Inflections 552 ^99 ( .01 ) 50b(.04) 525(.07) 

Nonwords 

Inflections only 689 65M.09) 648(.18; 

Irregular Inflections only 625 551 ( 0) S85(.06) 

Derivations and Inflections 691 615(.16) 653(.13) 6 75 ( . 1 3 ) 

Note — Error rates are in parentheses. 



One-way subject and item analyses were performed on response times to 
base words (conditions B1 , BB» IB, and DB). Separate analyses were done on 
the 32 items appearing only in inflected and base forms (Inflections Only), 
and on the 16 items appearing in derived, inflected, and base forms (Deriva- 
tions and Inflections). For the Inflections Only group of words, the effect 
cf priming condition was significant (subjects: F(2,H0)-17.90, £<.00l; items: 
F(2,62)-20.09» £<.001). Scheff^'s tests revealed~that the significart main 
effect was Oue to the B1 condition differing from the BB and IB conditions 
(subjects: F(2,40)-1 3.8, £<.001; items: F(2,62)-15.5, £<.001). The differ- 
ence between the Bb and IB conditions was not significant. 

An analogous analysis on the remaining 16 words revealed a similar out- 
come for inflections, but only a partial repetition effect for derivations. 
The main effect of priming condition was significant (subjects? F(3,60)«6. 17 , 
£-.001; items: r;J t 45)-4.87 f £-.005). Scheff£ f s testb showed that this ef- 
fect was again due to tlie B1 conditiow differing from the 8B and IB conditions 
(subjects: F(3 ; 60)-3.77, £-.015; items: F(3,^5 )-3.62 , £-.02). The BB and IB 
conditions did rot differ from each other? In the DB condition, the mean re- 
sponse time did not differ from either B1 response timo or IB and IB response 
times. These resuHs are very similar in pattern to those of Experiments 1 
and 3 of Stanners et al. (1979). 

Similar analyses were performed on nonwords. Separate analyses were done 
on response times to the regular nonwords appearing only in inflected and base 
t -ms, the 16 regular nonwords appearing as derivations, inflections, and 
b *s, and the 10 irregular nonwords. the effect of priming condition was 
ma jinally significant for the Nonvord inflections Only group in t'*e subject 
analysis only (subjects; F(2 ,40 )-2.96, £-.06; tteias: F(2 ,42)»1 .29 , £-.28). 
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Priminp was significant for nonwords in the Nonword Derivations and Inflec- 
tions in the subject analysis, F(3 , 60 )«5.55, £-.002, and marginally signif- 
icant in the item analysis, F(3, 457-2. 53, £-.06. Scheff^'s tests showed the 
significance of the former affect to be r'ue to the difference between the B1 
and BB conditions, F(3,60)=4.95, £-.004. Irregular Inflections Only reached 
significance in both analyses (subjects: F(2 t 40)*7.24 f £-.002; items: 
F(2,1 8)=*4.25 , £=.03). These effects were also attributable, as shown by 
Scheffe f s tests, to the difference between the B1 and BB conditions (subjects: 
F(2 t 40)«7.17, £-.002; items: F(2 ,1 8 )-4. 1 6, £-.03). 

Discussion 

The real-word results of Experiment 1 replicate the results of Stanners 
et al. (1979). Significant repetition priming of targets occurred for both 
base and inflection primes; derivations also primec their bases, but marginal- 
ly. Stanneis et al. interpreted the corresponding partial repetition effect 
they found to signify that derivations (and irregular inflections) have lexi- 
cal entries separate from their basa forms. 

The nonword results obtained in the present experiment weaken this expla- 
nation. Presumably, nonword repetition effects, particularly those among 
irregular nonwords, are largely episodic rather than lexical in origin. That 
is, they occur because subjects remember explicitly having seen the letter 
strings previously in the experiment and, perhaps, having made a particular 
response to them. If episodic priming affects response time to nonwords, it 
may also contribute to repetition — iming in words. 2 If it does, then partial 
repetition effects may reflect decreased episodic priming; the less the target 
in a prime-target pair looks like the prime, the less it reminds the subject 
of the prime. 

Considerations such as these led us to repeat this study with an attempt 
to reduce the effects of episodic memory on subject responses. 

Experiment 2 

In an effort to reduce episodic contributions to the repetition effect, 
we extended the lag between primes and targets of a oase morpheme frcn an 
average of 9 items in Experiment 1 to 48 items in Experiments 2a and 2b. 

In addition, we instituted a control for unequal practice on pH mes and 
* Sets. Necessarily, the prime of a morpheme appears earlier in the test se- 
quence than its target. Consequently, subjects are less practiced on tne 
average when they respond to primes than when they respond tc targets. Possi- 
bly, such an effect, too, contributes to priming. 

Any asymmetrical practice of this sort can be eliminated by a procedure 
first used by Forbach et al. (1974) but not used subsequently by Stanners et 
al. (1979). In the control procedure, the test sequence of words is parti- 
tioned into blocks. In the first block of test trials, only fillers and 
primes of morphemes are presented. In the second block, primes from tne first 
block are repeated as targets interleaved with a new set of primes. In subse- 
quent blocks except the last, new primes a~e Interleaved with repetitions of 
primes (now targets) from the previous block. In the final block, targets are 
interleaved with fillers. For most analyses, data /rom the first and last 
blocks are eliminated. Ir. this way, analyses are restri^ced to comparisons of 
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responses to primes and targets made at comparable levels of practice. Across 
subjects, words are counterbalanced so that every morpheme occurs equally of- 
ten in each block as a prime and target. 

Two experiments were run using these changes in procedure. In Experiment 
2a, primes werj inflections and base forms. In Experiment 2b, they were 
derivations and base forms. 

Method 

Subjects , Subjects were 72 students from the same pocl as in Experiment 
1. Thirty-six subjects participated in each of Experiments 2a and b. This 
gave three replications of all of the test-ord' * conditions in each experi- 
ment. 

Stimulus materials : Experiment 2a . Stimuli were 48 words and 48 non- 
words matched in length to the words. Each word, a verb, appeared as a prime 
ir. each of three forms: uninflected (base), inflected with "s^" and inflected 
with "ed. w An individual subject saw each morpheme only twice: once as a 
prime and once as a target. In every instance, inflected forms preserved bo vv i 
the spelling and the pronunciation of the base. Targets were invariably base 
forms. Real-word base forms appear in Appendix B. 

Nonwords were 24 orthosraphically regular and 24 irregular nonwords. 
Each nonword appeared as a prime in three forms: uninflected, inflected with 
"s," and infl^ced with "ed." As for the words, targets of nonwords were 
invariably "base" forms. 

Experiment 2b . Stimuli were 48 words an1 48 nonwords matched tc the 
vords in length. £ach word and nonword appeared as a prime in each of three 
forms: unaffixed, and affixed with two of several derivational affixes (two 
of "ment," "less, u M er, M "ly," "ness," "able," "ful"). As in Experiment 2a, 
each subject saw a given morpheme only twic°. All nonwords were orthographi- 
cally regular. Real-word bases are listed in Appendix B, 

lect orders . The test sequences consisted of one practice block and fiv 4 
test blocks each 48 trials in length. For purposes of counterbalancing, tho 
96 letter strings in the test list were partitioned into four sets. Each set 
included 12 words and 12 nonwords (four baseo, e<ght affixed items)* A Latin 
Square was used to order the sets into four different sequences. For example, 
the Latin Square ordering 1-2-4-3 creat:d a test sequence in which items in 
the first set constituted the primes of the first block of the test sequence 
and the target repetitions of the second block. Primes in block one were in- 
terleaved with filler items. Items in set 2 provided the primes in block 2 of 
the test sequence and the target repetitions in block 3. Items in set 4 pro- 
vided the primes in the third block and the target repetitions in the fourth 
block. Finally, items in set 3 provided the primes of block 4 and the target 
repetitions in the final block. In the last block, set 3 itema were inter- 
leaved with fillers. The ordering procedure created a lag of 48 items between 
the prime and target of a morpheme. 

The four test orders, each based on one row of the Latin Square, appeared 
in three versions. The versions were identical except for the affixes on 
their first occurring morphemes. For example, matched to a test order in 
•»hioh say, "pushes" appeared as a prime in block 2, were two test orders in 
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which "push" and "pushed 1 - respectively appeared as prime in block 2. In each 
experiment, one third of the priming items were bases; one third were words 
affixed with "s" in Experiment 2a and one of the derivational affixes in 
Experiment 2b; the remaining third included words affixed with "ed" in Experi- 
ment 2a and words with other derivational affixes in Exoeriment 2b. This gave 
12 different test orders foi oach of Experiments 2a and b. 

Design . Subjects experienced all levels of the independent variable, 
priming condition. The primary dependent measure wcs response time. 

Procedure . The procedure was identical to that in Experiment , . 
Results 1 

Response times and errors were analyzed as in Experiment 1 # Table 2 
presents response times and errors to base words and nonwords in blocks 2-4 
from E\ eriments 2a and 2b. 

Response time3 to base words in Experiment 2a differ as a function of 
their priming condition (subject analysis: F(2,70) - 54.73, £<. 001 ; item 
analysis: £(2,9*0 - 46.59, £<.001). Scheff^'s tests reveal no significant 
difference on the subjects analysis in response times to BB and IB words, 
F(2,70) » 2.56, £ - .08). However, the difference does reach significance on 
the item analysis, F(?,94) - 3.32, £ - .04). The 78 ms difference between 
conditions B1 and IB Is significant (subjects: F(2,70) - 29.45, £ < .001; 
items: F(2,94) « °2.9, £ < .001). Statistically, then, the repetition ef- 
fects or inflected words on bases are full. 



Tab^f: 2 

Response Times to Words and Nonwords in Experiments 2A (Left) and 2B (Right) 

B1 BB IB B1 BB DB 

Words 

611 510(.07) 533(.07) 585 543(.05) 538(.03) 

Nonwords 

643 627M4) 645(.10) 715 717(.17) 730C.16) 

Note — Error rates are in parentheses. 



Analysis of the response times to base and derived forms of Experiment 2b 
gives a similar picture (subjects analysis: F(2,70) - 9.03, £ < .001; items 
analysis: F(2,94) » 8.24, £ < .001). 

Table 2 also provides the comparable findings on nonwords. Repetition 
priming among nonwords was statistically absent in both studies (Experiment 
2a: Subjects analysis: F(2,70) - 2.02, £ - .14; item analysis: F(2,94) - 
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1.22. In Experiment 2b both F values are less than 1.) Thus, there is no ap- 
parent episodic repetition priming on nonwords in these experiments in which a 
48-item lag is used and in which the control procedure for practice is imple- 
mented. 9 (In all subsequent experiments, nonword effects will be reported in 
tables, but not described in the text unless they involve statistically sig- 
nificant effects. ) 

Discussion 

Having significantly reduced evidence of episodic priming in nonword 
stimuli, we obtain a somewhat different picture of repetition priming in de- 
rived and inflected words than we obtained in Experiment 1 and than Stanners 
et al. (1979) report. In particular, we find that repetition priming of a 
base form by a derivational relative is as strong as priming by an inflection- 
al relative. Moreover the priming is statistically and, in Experiment 2b, 
numerically, full. 

These findings invite one of two salient interpretations. Cne, 
compatible with Butterworth's assessment of the lexicon (1983) is that repeti- 
tion priming occurs among separate lexical entries in the lexicon; it is not a 
consequence (except in the case of exact repetitions) of repeated access to a 
common lexical entry. A second is that ic does reflect repeated access, but a 
lexical entry is more inclusive than had previously been suggested by repeti- 
tion-priming findings. As we will suggest in the General Discussion, the 
substantive differences between these views are smaller, in light of con- 
straints on their realizations imposed by our findings, than the statements of 
them suggest. 

In the next experiment, we farther examine the kinds of 
morphologically-related words that a**e strongly associated, or that share a 
lexical entry. We do so by examining priming of an unaffixed form by affixed 
morphologica relatives that do not necessarily preserve the spelling or 
pronunciation of the stem morpheme in the unaffixed form. In addition, we ex- 
amine two types of derivational ly affixed words. 

Possibly, the derived words we used in Experiment 2 were special and gave 
rise to unrepresentatively strong priming. Chomsky and Halle (1968) identify 
two types of suffix in English. One, neutral affixes, includes inflections 
and some derivations; these affixes do not affect pronunciation of the stem 
morphemes to which they are attached. In contrast, nonneutral (derivational) 
affixes do affect the stem morpheme's pronunciation (e.g., "sign "-"signal"). 
In Chomsky and Halle's theory, neutral affixes are separated from the stem 
morpheme by a word boundary, which prevents application of phonological rules 
o/er extents spanning stem and affix. Nonneutral affixes are separated from 
the stem by lesser, morpheme boundaries that do not prohibit application of 
phonological rules over the whole domain of stem plus affix. In our Experi- 
ment 2b, affixes were neutral derivational affixes. Perhaps it is n^t 
surprising that neutrally-affixed derivations were as effective primes as 
inflected words. 

In Experiment 3, we cunpare priming of unaffixed words by morphological 
relatives that do or do not share pronunciation or spelling of the stem 
morpheme with the unaffixed form. This allows us to compare priming by irreg- 
ular inflected words and regular morphological relatives (cf. Kempley & Mor- 
ton, 1982). In addition, in a post hoc analysis, we look specifically at 
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neutrally and nonneutrally affixed derivations and compare their priming 
effectiveness. 

Experiment 3 

In the present experiment, we examine priming by morphologically-related 
forms in which either the pronunciation or the spelling and pronunciation of 
the common morpheme is not shared by prime and target forms. The experiment 
had two purposes in addition to the one just described of examining priming by 
derived forms with nonneutral affixes. A related purpose was to reexamine ef- 
fects of decreases in formal overlap (and hence, for English, in reg 7 Parity) 
between morphologically-related primes and targets on repetition priming. 
Stanners et al. had found that priming of a base by an affixed form decreases 
as formal overlap between the affixed and unaf fixed words decreases. f ,'mpley 
and Morton (1982) found no priming between irregular and regular for^s when 
the words were presented auditorily. The present study was designed to reex- 
amine these priming effects under the conditions we have developed which re- 
duce episodic priming effects. Possibly, in the earlier studies, the differ- 
ences in priming across conditions was episodic in origin; targets following 
formally identical or similar primes cued memory for the primes while dissimi- 
lar targets did not. A final purpose of the experiment was to separate ef- 
fects of orthographic and phonological overlap between prime and target on the 
magnitude of priming. 

Method . 

Subjects . Thirty-six students participated in Experiment 3a and 24 dif- 
ferent students in Experiment 3b. All came from the same subject pool used 
previously. 

Stimulus materials . Two sets of twenty-four word triads were devised. 
In one set, the "Sound Only" set, each triad included one base form and two 
affixed forms; one affixed form preserved the spelling and pronunciation of 
the unaffixed form (henceforth the "NC" or "no change" form) and one preserved 
only the spelling (henceforth the "C" or "changed" form). A sample triad is 
"heal," "healer," and "health." (In six items, a silent "e" in the base 
morpheme was deleted in an affixed form.) The second set, the "Sound and Spel- 
ling" set, also consisted of triads including an unaffixed form and two af- 
fixed words. In this set, one affixed word shared both spelling and 
pronunciation of the base morpheme with the unaffixed word (the "NC" form for 
this set) while the other affixed word shared neither spelling nor pronuncia- 
tion with the unaffixed word (the "C" form). An example is "clear," "clear- 
ly," "clariiy." In both sets, words in the third category were, with few 
exceptions, irregular forms. 

Bemuse Experiment 2 showed no difference in priming by inflected and de- 
rived forffs that shared spelling and pronunciation with the unaffixed form, we 
felt justified In mixing the two types of affixed forms in our new lists. 
However, approximately equal nuiribers of derived forms and equal numbers of 
Inflected forms occurred in the Sound Only and Sound and Spelling triads, and 
there were sufficient numbers of pairs of neutrally-affixed forms and 
nonneutrally-af fixed forms that they could be examined separately in a 
post-hoc analysis. 
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Phonological o/erlap between unaffixed and affixed words was matched 
across Sound Only and Sound and Spelling lists by counting each vowel, conso- 
nant, or stress change as one change and matching number of changes acros? the 
two lists. In addition, an effort was made to match type of change (vowel, 
consonant or stress) as closely as possible. Our final experiment (Experiment 
4b) is an auditory lexical decision experiment using these materials, which 
shows that our matching efforts were successful. 

Unaffixed words in the Sound Only and Sound and Spelling lists were 
matched in length and frequency (KuSera & Francis, 1967). Similarly, the two 
different types of affixed forms were matched in length and frequency within 
and across the two lists. Appendix C lists the word triads in the two stimu- 
lus sets. 

We created triads of nonwords from triads of words that might have ap- 
peared as word stimuli in the experiment. They were made into nonwords by 
changing one or two .letters, while preserving their orthographic regularity. 
Forty-eight nonword triads were created in this way. 

From the sets of words and nonwords, three basic stimulus lists were 
created. Each base morpheme appeared twice in each list, once as a prime and 
once as a target. The lists differed in respect to which version of the 
morpheme (unaffixed, affixed with no sound or spelling change, affixed with a 
change) appeared as the prime. The target was always the unaffixed form. In 
each list there were sixteen of each type of prime. Half of each set of 16 
items was frcm each of the two sets of stimulus words. There were sixteen of 
each type of nonword prime. 

The stimulus lists were organized exactly as in Experiments 2a and 2b. 
As in those experiments, four versions of each basic list were created so 
that, across subjects, each prime occurred equally often in the first four 
blocks of stimuli. Each stimulus list was preceded by a practice list of 24 
words and 24 nonwords randomly ordered. 

Procedure . The experiment was run twice. The second experiment (3b) was 
identical to the first (3a) except that the stimuli were presented under 
degraded viewing conditions (by turning down the contrast on the U*T screen) 
in an effort to slow response times and thereby, perhaps, magnify the very 
3mall departures from full repetition priming we observed in Experiment 3a. 
This manipulation had no effect on the pattern cf reaction times we observed; 
therefore, we present both outcomes together. 

The procedure and instructions to the subjects were identical to those 
used in Experiment 2. 

Design . Subjects participated at all levels of the two independent vari- 
ables, Stimulus Set (Sound Only, Sound and Spelling) and Priming Condition 
(B1, BB, Nl ["no-chaf ge/base w — that is, a base primed by an affixed word in 
which the sound and spelling of the unaffixed base morpheme is preserved], CB 
["changed -form /base" — that is a base primed by an affixed word in which the 
base pronunciation or spelling and pronunciation is changed from the unaffixed 
version]). The major dependent measure is response time. 
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Results 1 

Extreme response times were deleted from the data as described for 
Experiment 1. Results for word and nonword stimuli are presented in Table 3 
collapsed over the factor Stimulus Set. Separate two-way repeated measures 
analyses of variance with factors Priming Condition (B1 f BB, NCB, Cd) and 
Stimulus Set (Sound Only, Sound and Spelling) were performed on the outcomes 
of the two experiments using subjects as a random effect. Separate items ana- 
lyses were also run with one wi thin-groups factor (Priming Condition) and one 
between-groups factor (Stimulus Set). In Experiment 3a, the effect of Priming 
Condition reached significance in both subjects and items analyses (subjects: 
F(3,105) - 20.82, £ < .001; items: F(3J38) - 12.81, £ < .001). The effect 
of Stimulus Set was significant in the subjects analysis, with response times 
faster in the Sound Only condition, but was nonsignificant in the items analy- 
sis (subjects: F(1 ,35) - 12.59, £ < .001; items: F(1,46) - 2.44, £ - .12). 
The interaction did not approach significance in either analysis (both Fs < 
1). 



Table 3 

Response Times in Experiments 3A and 3B 



Experiment 3A 
Experiment 3B 
3utral and nonneutral 
derivations 



B1 

Words 
623 
6T3 

669 



BB 

558 (.05) 
590(.05) 

579 



NCB 

575(.09) 
612(.06) 

586 



CB 

584(.06) 
621 (.09) 

601 



Experiment 3A 
Experiment 3B 



Nonword s 
760 
788 



748(.14) 
777(.18) 



758(.15) 
779(.17) 



746M3) 
783(.18) 



Note — Error rates are in parentheses. 



The effect of prime type is due primarily to the difference between the 
response to an unaf fixed prime and its occurrence as a target following any of 
the three primes (subjects: F(3,105) - 17.75, £ < .001; items: F(3,138) - 
10.80, £ < .001). Among the prime conditions, che difference in t^e effect of 
an unaffixed prime (BB) as compared to the effects of the other primes (NCB, 
CB) reaches significance in the subjects analysis, but not in the items analy- 
sis (subjects: F(3,105) - 2.80, £ - .04; items: F(3J38) - 1.73, £ - .16). 
The additional effect of sharing or not sharing spelling or pronunciation with 
the base (that is, the differen between 575 and 584) is not significant. 

We performed additional analyses on the data of Experiment 3a having re- 
moved the six items from the Sound Only condition in which presence and ab- 
sence respectively of a silent M e M distinguished the base and affixed forms. 
Removing these items had no effect on the outcome of the experiment. The ef- 
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feet of prime condition remained highly significant; neither the effect of 
Stimulus Set nor the interaction was significant. 

A major finding of Experiment 3a is that priming by affixed forms is 
nearly full. The priming by NCB forms replicates the outcome of Experiments 
2a and 2b. Overall in the present experiment, 17 ms less priming occurs when 
the prime differs from the target in being affixed, but shares sound and spel- 
ling with the prime as compared to priming by the unaffixed form itself. CB 
forms reduce priming by an additional 9 ms. 

We ran Experiment 3b to ask whether, by slowing response times, we could 
magnify the small differences we observed between the BB, NCB, and CB condi- 
tions. Our manipulate \ 9 reducing the contrast on the CRT screen, slowed re- 
sponse time overall 9 jy 42 ms. The slowing was significant in an items analy- 
sis (F(1,92) * 8.23, 2 - .006), but not in the subjects analysis (F1,58) - 
2.17, £ = .14). There were no interactions involving the factor Experiment in 
the overall analysis and, in the analysis of Experiment 3b, there was no in- 
crease in the magnicude of the separation of BB and NCB times on the one hand 
or NCB and CB times on the ether* Statistical analysis of the response times 
in Experiment 3b provided an identical pattern of significant effects to the 
pattern observed in Experiment 3a. 

In Experiment 3b, error proportions were .04, .06, and .09 on BB, NCB, 
and CB items, respectively. This was significant (subjects: F(2,46) - 6.07, 
£ - .005; items: F(2.92) « 6.96, £ - .002). 

A final analysis examined neutrally- and nonneutrally-aff ixed derivations 
separately from the irregular inflected forms that were included in the stimu- 
lus sets. The purpose of the analysis was to answer the question raised by 
the finding in Experiment 2 that derivations as well as inflections fully 
primed their base forms. The question raised was whether this finding is 
limited to neutrally-affixed derivations, which preserve the pronunciation of 
the base morpheme. 

Eight Sound Only and ten Sound and Spelling triads permitted a comparison 
of priming by NC neutrally-affixed derivations and by C nonn^utrally-aff ixed 
derivations. These 18 items were subjected to a one-way analysis of variance 
with the single factor Prime Condition (B1, BB, NCB, and CB). The analysis 
collapsed over the nonsignificant factor, Stimulus Set, and across Experiments 
3a and b. Only the items analysis was performed. As Table 3 reveals, the 
pattern of means mirrors very closely that of the overall analysis, ^he pat- 
tern of significant and nonsignificant differences is also the same as in the 
overall analysis. Thus, the overall effect of Priming Condition is signif- 
icant, F(3,51) - 10.19, £ < .001). Moreover, the three affixed primes dif- 
fered from the B1 condition both separately and as a group (overall F (3,51 ) - 
9.70, £ < .001); they did not differ from eac 1 : other (all Fs less than one). 
This implies no substantial difference between neutrally- and nonneutrally-af - 
fixed derivations in their ability to prime an unaffixed morphological rela- 
tive. 

Discussion 

The major outcome of the present study is that there is essentially no 
loss in repetition priming when the orthographic or phonological representa- 
tions of affixed primes and morphologically-related targets do not fully over- 
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lap. Because we found no effect on priming of differences in form between 
prime and target, we could not separate effects of spelling and sound differ- 
ences as intended. Experiment i\ will address that issue once again. We did 
find a suggestion, significant in the subjects analysis only, of a small loss 
in priming when an affixed prime precedes a base carget as compared to exact 
repetition priming, but there is no significant additional loss when the af- 
fixed form differs in sound or sound and spelling from the base. This Siiift, 
too, was a shift from regularly-affixed words to largely irregular forms. 
Thus, we found no loss in priming between regularly-affixed forms and their 
irregular morphological relatives. 1 * Accordingly, we conclude that, however 
repetition priming effects are explained — as repeated access to a ccmmon lexi- 
cal entry or as priming among strongly associated but distinct entries or in 
some other way — the relationships of irregular aid regular, derived, inflected 
and unaf fixed forms must be explained in fundamentally the same way. 

Experiment 4 

We designed the final experiment with two main purposes in mind. One was 
to compare priming in the auditory and visual modalities. In Morton's logogen 
model, each logogen has paired auditory and visual inputs (Morton, 1981). 
T hat is, a word has a logogen (in the model's most recent version, an "output 
logogen," but not an "input logogen") in commor whether it is auditorily or 
visually presented. This idea is supported cy findings of some cross-modal 
repetition priming (Kirsner, Milech, & Standen, 1983). However, whereas in 
Experiments 3a and 3& we found strong priming of visually-presented unaffixed 
words by irregular morphological relatives, Kemp ley and Morton (1982) found no 
priming between auditorily-presented unaffixed words and irregular, inflected 
morphological relatives. Kempley and Morton used different stimuli than we 
did and a different paradigm with longer lags between prime and target. 
Consequently a variety of reasons for this difference are tenable. In the 
present study, we use ccmmon word sets and a common paradigm to compare prim- 
ing in the two modalities directly. 

Our second purpose was to examine priming when affixed words appear as 
targets in the repetition-priming paradigm. Tnir allows us to address two 
questions, one theoretical and one methodological. The first question 
concerns the organization of morphological relatives in the lexicon. One 
possibility is that all morphologically-related words are uniformly related to 
each other in the lexicon. Other possibilities can be imagined as well, how- 
ever. One may be developed by analogy from a theory of lexical organization 
in Serbo-Croatian, a highly inflected language (Lukatela, Gligorijevic, Kosti<5 
& Turvey, 1980). In that so-called "satellite-entries" theory, a particular 
inflected form, the nominative, rather than the root morpheme, is proposed as 
the hub of an array of associated morphologically-related words (satellites). 
Inflected words other than the nominative are associated to the nominative 
form but not (or less strongly) to each other. In this organization, the 
nominative should prime and be primed by other morphologically-related affixed 
forms more effectively than the affixed forms prime each other. In English, 
the unaffixed base form is the most likely counterpart to the nominative in 
Serbo-Croatian. If English has an analogous organization, then the unaffixed 
word should prime and be primed by affixed forms more effectively than affixed 
forms prime each other. Our experiment is designed to discriminate between 
these views by examining priming of affixed words by unaffixed and other af- 
fixed morphological relatives. 
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The methodological question concerns the possibility that the patterns cf 
priming that we obtain using our paradigm are largely products of a para- 
digm-specific strategy by which subjects predict the target given the prime. 
Forster and Davis (1984) and Oliphant (1983) have shown that repetition prim- 
ing is severely diminished (and absent in Oliphant's study) if subjects are 
unaware that words are repeated in the experiment. In the work of Forster and 
Davis, some subjects are made unaware of the repetitions because the prime is 
masked. Repetition priming is small, short-lived and, in at least one respect 
(absence or presence of a frequency-by-priming interaction), qualitatively 
different in pattern from priming observed when subjects are aware of the 
prime. 

In other research (Napps, in preparation), one of us has also found a re- 
duction in the magnitude of repetition priming when the proportion of targets 
in the experiment is only .06 of all stimulus items. Nonetheless, even under 
these conditions, significant priming is found using the Sound and Spelling 
stimuli of Experiment 3 out to the longest lag examined in that experiment (10 
intervening items). Napps 1 findings in this study and in others using low 
proportions of repeated items suggest that the priming we obtain with a high 
proportion of related items does not create the appearance of relations among 
morphological relatives that are unrelated in the lexicon. Rather, they en- 
hance effects of existing relations. 

To further address the question whether our priming reflects lexical 
organization, or instead reflects predictability of the target given the 
prime, we designed Experiment 4 to reduce the subjects 1 ability to make useful 
predictions. In Experiments 1-3, targets were always unaffixed words. 
Accordingly, given a prime, subjects could guess the identity of the target 
word that would appear some 50 items later in the next block of stimuli. In 
Experiment 4, targets were less predictable than in earlier experiments be- 
cause they were one of several possible affixed morphological relatives of 
pri mes. 

As a second assessment of the role of prediction, we provide a separate 
analysis of repetition priming effects un the very first block of the experi- 
ment in which repetitions occur, and thus before subjects have an opportunity 
to develop a strategy of guessing targets from primes. 5 



Methods 

Subjects . Subjects were 72 students from the same subject pool used 
previously. Thirty-six students participated in each of Experiments 4a and 
i|b. All subjects had normal hearing In Experiment 4b. 

Stimulus materials . The materials were those used in Experiment 3, with 
one exception. In the test lists, the NC affixed form replaced the unaffixed 
form in all positions in which it occurred as a target. This yielded priming 
conditions NC1 (first occurring affixed item), NCNC (affixed word primed by 
itself), BNC (affixed item primed by the unaffixed form), CNC (affixed item 
primed by an affixed morphological item that does not preserve the pronuncia- 
tion or the spelling and pronunciation of the unaffixed morpheme). 

For Experiment 4b, stimulus items were recorded onto audio tape by a fe- 
male native speaker of English (CAF). These productions were sampled by 
computer at 10 kHz. This enabled the same token of each NC prime or target 
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item to be used in all conditions. The test orders were recorded on one chan- 
nel of an audio tape. Tone bursts were recorded on the second channel of the 
tape for purposes of collecting response times. The tone bursts were syn- 
chronized to the onsets of acoustic energy of each stimulus item in the test 
order. Therefore response times include word duration (or as much of the word 
as occurred before the subject made his or her button-press response). That 
stimulus woras have different durations is unimportant in the repetition prim- 
ing procedure because criti *» '">mparisons involve response times made to the 
same items across different > ig conditions. Stimulus items were recorded 
onto audio tape with a three-S' ^nd inter-stimulus interval. 

Only three test list3 were used in Experiment 4b as compared to the 12 
used in Experiments 2, 3 and 4a. The three lists had the same order of stimu- 
lus items but differed in renpect to which of the three prime types occurred 
with each target item. It was infeasible to include the additional test ord- 
ers needed to counterbalance the block in which each stimulus item appeared as 
prime and target. 

Procedure . The procedure for Experiment 4a was identical to that for the 
previous experiments. 

In Experiment 4b, subjects listened over headphones to binaural presenta- 
tions of the test list. A N^w England Digital Able 40 minicomputer monitored 
the second tape channel for the tone bursts and started a millisecond clock 
when one was detected. The clock was read and a response and response time 
were stored when subjects pressed the labeled "word" or "nonword" button on 
the computer-terminal keyboard. If a response was not made within 2.5 seconds 
following stimulus presentation, the computer stopped the tape recorder and 
printed, "Please make a response" on a CRT screen facing the subject. Receipt 
of the button-press response restarted the tape recorder. The tape recorder 
was also stopped between blocks as subjects received feedback on their mean 
response times and accuracies for the block. Subjects initiated successive 
blocks by hitting a key on the terminal keyboard. 

Design . In both experiments, subjects participated at alJ levexs of the 
independent variables, Priming Condition (NC1 , NCNC, BNC, CNC) and Stimulus 
Set (Sound Only, Sound and Spelling). The major dependent measure was re- 
sponse time. 

Results 1 

Errors and extreme response times were eliminated from the analysis as in 
the earlier experiments. Table 4 provides the mean response times and errors 
for Experiments 4a and 4b. 

Separate two-way repeated-measures analyses of variance were performed on 
the response times of Experiment 4a using subjects and items as random fac- 
tors. The independent variables were Prime Condition (NC1 , NCNC, BNC, CNC) 
and Stimulus Set (Sound Only, Sound and Spelling). In both analyses, the ef- 
fects of Prime Condition (subjects: F( 3 , 1 05) - 14.79, £ < .001; items: 
F(3, 1 38 ) » 16.46, £ < .001) and the interaction (subject*: F ( 3 • 1 05 ) * 4.29, £ 
= .007; items: F ( 3 . 1 38 ) - 3.00, £ - .03) were significant. Scheffd's tests 
performed on the two stimulus sets separately show that, for the Sound Only 
condition, all three primed conditions differ from tne inprimed condition an3 
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Table 4 

Mean Response Times in Experiments 4A (Visual) and 4B (Auditory) 



NC1 



NCNC 



BNC 



CNC 



Experiment 4A 
Sound only 
Sound and spelling 



Words 

633 
6&7 



571 (.05) 
574(.07) 



585(.04) 
591 (.07) 



580(.07) 
646 (.08) 



Experiment 4B 
Sound only 
Sound and spelling 



796 
807 



734(.09) 
754 (.05) 



770(.07) 
762(.06) 



780(.11) 
772(.10) 



Experiment 4A 
Experiment 4B 



Nonwords 
761 
861 



757(.13) 
868(.15) 



771 (.14) 
862( .16) 



768(. 11) 
86 3 ( . 1 8 ) 



Note — Error ^ates are in parentheses. 



do not differ from each other. For the Sound and Spelling condition, however, 
whereas the B and NC primes were effective, the C prime did not lead to re- 
sponse times significantly faster than the no-prime condition. 

With two exceptions, the outcome of Experment 4b was very similar to that 
of Experiment 4a. In the analysis of response times to auditorily presented 
targets, only the effect of Prime Condition was significant (subjects: 
F(3.105) - 34.90, £ < .001; items: F(3,1 38 ) - 13.46, p < .001). Neither the 
main effect of Stimulus Set nor the interaction approacned significance. The 
nonsignificant interaction contrasts with the outcome of Experiment 'la. The 
absence of an interaction between Stimulus Set and Priming Conditio! with au- 
ditory presentation is not surprising in view of the fact that in Experiment 
4a the interaction could be ascribed lo the presence or absence of spelling 
differences between prime and target, 'ihe loss of the interaction indicates 
that we succeeded in matching the Stimulus Sets along other relevant dimen- 
sions. 

Scheff£ f s tests on the effect of prime condition showed that all three 
primed conditions had shorter response times than the unprimed condition. In 
addition, however, the exact repetition condition differed significantly from 
the other priming conditions on both subjects and items analyses. This 
statistically partial priming is the second contrast with the outcome of 
Experiment 4a. 

In view of the apparent effect of changing spelling between affixed 
primes and taigets in Experiment 4a only, we compared the outcomes of the 
visual and auditory experiments explicitly. We transformed response times to 
difference scores by sub tracting response times in the BNC condition from 
those in the CNC condition separately for the Sound Only and Sound and Spel- 
ling stimulus sets. This provides an estimate of the effects of changing 
192 
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pronunciation alone (Sound Only words) or of changing both pronunciation and 
spelling (Sound and Spelling words) between prime and target with visual and 
auditory presentation. We performed analyses of variance on the difference 
scores with factors Experiment and Stimulus Set. The effect of Stimulus Set 
(subjects: F(1,70) - 4.67, £- .03; items: F (1 ,92) - 3.49, £- .06) and the 
interaction (subjects: F(1,70) - 4.12, £- .03; items: F(1 ,92) = 3.18, £« 
.07) were significant in the subjects analysis and marginally significant in 
the items analyses. Planned comparisons on the interaction in the subjects 
analysis showed that the effect of a spelling difference was greater with 
visual than auditory presentation (F(1,70) « 5.10, £ - .02); the difference 
between the modalities of presentation on the effect of pronunciation alone 
(Sound Only) was nonsignificant (F < 1). 

One more analysis of the data from each experiment was performed. To ask 
whether a subject's ability to guess the target fran the prime accounts for 
priming effects, we examined primes in the first test block and their repeated 
targets or morphologically-related targets in the second block in Experiments 
4a and 4b. 

In Experiment 4a, across subjects, all items appeared as primes in the 
first block and as targets in the second. In Experiment 4b, this 
counterbalancing was infeasible; therefore, just one fourth of the items in 
each condition appeared as primes and targets in the first two blocks. 

Restricting our analysis to the primes in the first test block and their 
targets in the second, in Experiment 4a, the effects of Priming Condition are 
highly significant in both subjects and items analyses (subject: F(3,105) - 
8.89, £ < .001; item: F(3,138) - 9.44, £ < .001). The effect of Stimulus Set 
(Sound Only, Sound and Spelling) was significant in the items analysis only; 
the interaction did not approaLii significance in either analysis. Means in 
the four priming conditions, NC1 , NCNC, BNC, and CNC were 684, 567, 607, and 
622 collapsed over stimulus sets. These times conform closely to means 
computed over all blocks presented in Table 4. A planned comparison of means 
in the NC1 (unprimed) and CNC (primed by an irregular form) conditions was 
significant (subject: F(1,105) - 7.23, £ - .008; item: F (1 f 1 38 ) - 7.41, £ - 
.007), confirming that priming among regular and irregular affixed forms is 
present even when subjects are not aware that primes or ti 4 ?ir morphological 
relatives will be presented later in the experiment. 

The safte analysis performed on the first two blocks of trials in Experi- 
ment 4b gave essentially the same outcome. In that set of analyses, the ef- 
fect of Priming Condition was significant (subject: F(3,105) - 11.82, £ < 
.001; item: F(3,138) - 6.76, £ - .001). No other factors were significant. 
Means were 7 '7 "695, 753, and 740 for NC1 , NCNC, BNC, and CNC priming condi- 
tions, respectively. A planned comparison of the conditions NC1 and CNC was 
significant (subject: F(1,105) - 9.16, £ < .001; item: F(1,138) - 7.98, £ - 
.001). 

The reaction-time mean*, and the pattern of significant effects in these 
restricted analyses conform closely to those obtained in the overall analyses. 
Thus, they confirm that repetition priming in the lexical decision paradigm 
does not require a strategy of predicting targets from primes as the primes 
are presented. 
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Discussion 

We designed Experiments 4a and 4b to address three ouestions. The first 
was whether the logogen model, with its paired acoustic and visual input 
logogens, was tenable, particularly in light of our findings in Experiment 3 
as compared to those of Kemp ley and Morton (1982). In Experiment 3» we found 
that visually-presented irregular words do prime their unaffixed relatives 
fully. In contrast, Kempley and Morton (1982) found that auditorily-presented 
unaffixed words and their irregular inflected relatives do not prime each oth- 
er. In the present study, we found very similar priming in the two 
modalities. 

A second question was whether we would find evidence of asymmetrical re- 
lations among morphological relatives as researchers have found for Ser- 
bo-Croatian ^Lukatela et al., 1980). The experiment failed to support an idea 
that morphological relatives have a satellite organization, with the unaffixed 
base word as the oenter of the satellite . Instead, with one exception, all 
relationships among morphological relatives appeared strong. 

We did obtain one outcome suggesting both a difference between auditori- 
ly- and visually-presented words in the lexicon and suggestive of a satellite 
organization among orthographically-represented words. We found that, with 
visual presentation, whereas base words are primed essentially fully by af- 
fixed morphological relatives not sharing either the spelling or the 
pronunciation of the shared morpheme (Experiment 3), affixed targets that pre- 
serve the spelling and pronunciation of the unaffixed morpheme are not 
(Experiment 4a). This loss in priming apparently can be ascribed to the spel- 
ling difference between the affixed fc *ms since an analogous affect was not 
obtained in the auditory version of the experiment (Experiment 4b). Further 
evidence will be needed to determine whether this single outcome 3uggestive of 
different organizations for phonetic and orthographic forms of words is found 
reliably. 

A final question addressed by the experiments was whether our procedure 
creates priming effects bj inviting subjects to generate candidate targets 
when primes are presented. We answered this question in the negative based on 
two sources of evidence. First, priming occurs over lags of nearly 50 items 
even when the target is not highly predictable from the prime. More convir - 
ing, perhaps, is the significant priming in the first two blocks of test tr. 
als in which subjects would have no reason to adopt a guessing strategy. 
These analyses yielded mean response times and patterns of significant effects 
remarkably sirn^ar to those of the overall analyses. In particular, priming 
even by irregular forms remained strong in analyses of both visually- and 
auditorily-presented words. Therefore, we ascribe the difference in outcome 
between our studies and that of Kempley and Morton either to differences in 
the items used or to a longer time lag between prime and target in the experi- 
ment by Kempley and Morton (1982). The latter appears more likely. Kempley 
and Morton used inflected forms only, and, if there is a difference in 
strength of priming at all between inflected and derived forms, priming by 
inflected forms should be stronger. 
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General Discussion 

Our major findings can be summarized as follows. We found that losses in 
priming from full to partial or less, when exact repetition priming is can- 
pared wi th priming by morpholog ical relatives, tray be ascribed at least in 
part to episodic contributions to repetition priming tnat are larger the more 
similar the prime and target. By reducing the contribution of these sources 
of repetition priming, we find strong priming — statistically full in most 
cases— among inflected, derived and unaffixed word3, and between regular and 
irregular words, with either auditory or visual presentation. Accordingly, if 
repetition priming is interpreted as reflecting lexical organization as we 
assume, then our findings eliminate a theory of lexical organization in which 
regular inflected forms, but not derived forms or irregular inflections, share 
a lexical entry with the base. Correspondingly, they eliminate a theory in 
which the domain of a lexical entry is just those words that can be generated 
by productive, grammatical rules of affixation (see Butterworth, 1983, for a 
similar conclusion). 

Our findings invite either cf two extreme interpretations previously con- 
trasted in the literature (e.g., Butterworth, 1 983 ) . One is that full repeti- 
tion priming (after Gtanners et al., 1979) or full and partial priming (after 
Murrell and Morton, 197*0 reflect a lexical entry shared by primes and tar- 
gets. Therefore, they signal that inflected, derived, regular and irregular 
morphological relatives share a lexical entry. This interpretation offers a 
way of capturing the large differences in longevity that have been found be- 
tween repetition priming and semantic priming in the literature (cf. Hender- 
son, 1984). Whereas we have found priming even when nearly 50 items intervene 
between prime and target, in studies of semantic priming, prining is absent by 
a lag of 1 or 2 items (Dannenbring & Briand, 1982; Davelaar & Colt heart, 1975# 
Gough, Alford, & Holley-Wilcox, 1981 ; Meyer, Schvaneveldt , & Ruddy, 1972; see 
also Henderson, 1984, for a direct comparison of semantic and repetition prim- 
ing). 

An unappealing consequence of adopting this interpretation, however, is 
that the concept of lexical entry is severely weakened. Entries that are as 
encompassing as our findings imply lack any obvious utility for the language 
user. The entries cannot 3erve as input to regular rules of affixation. In- 
deed, rather than consisting of the stem morpheme, affixed by rule, each entry 
perhaps must be considered a cluster of tightly associated affixed and 
unaffixed morphological relatives — a conceptualization not very distinct from 
the second interpretation we will consider. A second unattractive property of 
the present interpretation is that each entry cannot be associated necessarily 
wit* any semantic information at all that is common to words within the domain 
of the entry (cf. Aronoff, 1976) or to any one syntactic class. Moreover, if 
the entries are logogens, they do not keep an accurate frequency-based expect- 
ancy for all words within the domain of the entry. 

An alternative interpretation questions whether semantic and repetition 
priming are, in fact, qualitatively distinct. Possibly, morphological- 
ly-related words that prime each other over very long lags are distinct words 
in the lexicon that are strongly related semantically . If so, then, there are 
no grounds for using the priming effects as a basis for inferring sharing of 
lexical entries. One advantage of this hypothesis is that just one mechanism, 
not two, is required to account for priming. A second advantage is that lan- 
guage users are not presumed to have lexical entries that encompass syntacti- 
cally and semantically diverse morphological relatives. 
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Along with other researchers (e.g., Henderson, 1984; Morton, 1981), how- 
ever, we are skeptical that morphological priming is exhaustively semantic. 
For one thing, researchers attempt to use words with the strongest associa- 
tions or the maximum semantic relatedness when they test for semantic priming; 
nevertheless semantic priming does not approach the longevity of repetition 
priming under comparable conditions. Second, derived words tend to drift 
semantically after they are coined so that their meaning is not a simple 
compositional function of the meaning of the stem plus that of the affix 
(Aronoff, 1976); therefore, derived words tend to be less semantically related 
to morphological relatives than are inflected wo^ds. However, we obtain 
equally strong priming frcm words of both types. 

In any case, it may not be necessary to choose between a view that 
repetition priming reflects repeated access to an entry and one that it re- 
flects associations among words in the lexicon. A third perspective on the 
lexicon may capture the best features of both of these views. The perspective 
that we propose is derived from recent network models of the lexicon (e.g., 
Dell, 1980, 1984; McClelland & Rumelhart, 1981; Stemberger, 1982), in particu- 
lar Dell's model, which is designed to produce speech and, in so doing, to 
generate natural slips of the tongue. Dell's model provides a more useful 
source than the more obviously related model by McClelland and Rumelhart 
(1981), designed to generate aspects of word -recognition behavior, because 
Dell's model includes a required representation of morphological structure. 
His model has not been extended to orthographic representations of words, but 
there are no principled barriers to doing so. 

In Dell's network model, the lexicon is a hierarchy of levels of 
representation including words, morphemes, syllables, syllable constituents, 
phonemes, and phonetic features. Words such as "swimmer" and "swimming" have 
distinct word representations (called "nodes") but connect to a common 
stem-morpheme node and from there to common syllable and phoneme nodes for the 
shared stem morpheme. Word nodes also have connections to semantic memory, 
where, presumably, "swimmer" and "swimming" connect to common and to distinct 
concepts. A word such as "swift" has distinct word, morpheme and syllable 
nodes from "swimmer," but some common phonemes. Finally, a word such as 
"drown" is unconnected to "swimmer" and its constituents at any level in the 
lexicon, but shares concepts with it in semantic memory. 

The structure of the model is well-suited, in general, to explain our 
pattern of findings. It gives morphological relatives closer ties to each 
other (other things equal) than to other words in the lexicon; yet it does so 
without either requiring morphological relatives to share a common word node 
or treating morphological relations as semantic. Moreover, it can explain why 
we and others (Kempley & Morton, 1982; Murrell & Morton, 1974; Stanners et 
al., 1979) consistently find numerically or even statistically weaker priming 
when prime and target are not exactly the same word as when they are. 

One difficulty with the model, however, is that it does not allow irregu- 
lar words such as "heal" and "health" to share a morpheme node as it must to 
explain our priming in Experiments 3 and 4. It is prevented from doing so be- 
cause the syllable structure and phonemic constituents of a word are elaborat- 
ed at hierarchical levels leading frcm the morpheme nodes, thereby requiring 
that morphemes sharing a node have the same pronunciation. The model could be 
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adjusted by having the syllable level and the levels below it connect directly 
to the word nodes and not to the . orpheme level. Morphological structure, 
then, would be a hierarchical level independent of levels of phonological 
structure. This kind of separation may have independent motivation from theo- 
ries of metrical structure in linguistics (e.g., Selkirk, 1980). However, it 
remains to determine whether Dell's model, so modified, would produce natural 
patterns of speech errors involving morphological structure. 

Although the structure of the network model just outlined provides an 
interesting alternative to both views of the lexicon usually contrasted in the 
repetition priming literature, the processing assumptions of a network model 
cannot handle repetition priming *t the lags over which we observe it. In 
Dell's model, nodes at 2ach hierarchical leve 1 . are connected by bidirectional 
excitatory lines of association. Activation of a node is progressively incre- 
mented as activation spreads frcm it to its associated nodes and back again. 
To prevent every node in the lexicon from being activated eventually, activa- 
tion of a node is shut down once the relevant unit has been output by the sys- 
tem (in Dell's model, once a phoneme or word has been spoken). For a variety 
of reasons, activation does tend to rebound after a node's activation has been 
shut down; this promotes perseveration errors in speech (for example [from 
Dell, 1980]: "to the bank to pick up some money" — "to the bank to pick up 
some bank"), and it may explain repetition priming of the magnitude and 
longevity observed by Forster and Davis (1984! and by Napps (in preparation) 
when subjects are unaware of repetitions in the experiment. However, activa- 
tion lasting for 48 subsequent items (or two days as Scarborough et al., 1977, 
have observed) would have disastrous consequences for the model's normal 
operations. Evidently, priming of the longevity we observe is strategic; 
possibly, it can be seen in the context of the model as strategic maintenance 
of activation of a node previously activated by stimulus input. This strate- 
gic activation would play no role in ordinary speech and reading, but can be 
exploited as we have done to strengthen repetition priming processes that re- 
veal the organization of words in the lexicon. 
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Footnotes 

*The response times we report here differ in absolute value frcm times 
reported in Napps and Fowler (1983) and Napps, Fowler, and Feldman (1984). 
The procedures we use to present stimuli to the computer-terminal screen and 
to collect response times create constant errors. The present response times 
have been adjusted for those constant errors. The times in the earlier 
presentations were unadjusted. The adjustments do not affect the size in ms 
of priming effects. 

2 0ne outcome in Experiment 1 is at apparent odds with the conclusion that 
seme of the priming on words is episodic. We would expect the IB condition to 
give rise to slightly longer response times than the exact-repetition (BB) 
condition. A small difference (7 ms) in the appropriate direction does occur 
in the Inflections and Derivations stimuli, but it is reversed (-3 ms) in the 
Inflections Only stimuli. However, looking across experiments of our own and 
of others in the literature in which a comparison can be made, in six of eight 
comparisons IB exceeds BB. The differences are always small and usually 
nonsignificant. That they are small is not surprising, however. Inflections 
and base forms are orthographically and phonologically very similar. Moreo- 
ver, it is possible that more lexical information than simply word forms con- 
stitutes an episodic trace in our experiments. Much of that additional infor- 
mation will be the same for inflections and base forms. 

'Another assessment of episodic priming in the present experiment may be 
obtained by comparing response times to words in Experiments 2a and b with 
corresponding times in Experiment 1 . Although the mean response times may 
differ across the experiments due to differences in lag, in subjects, and, in 
Experiment 2b, stimulus materials, there will be no loss in episodic priming 
in the B1 condition of Experiments 2a and b as compared to Experiment 1 and 
therefore B1 response times should be closest across the experiments. For the 
same reason, DB conditions should show little change when episodic priming is 
eliminated. The BB and IB conditions should show a relative increase in re- 
sponse time, however. With just one notable exception, the outcomes of 
Experiments 2a and b are consistent with the predictions. Conditions B1 in 
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Experiment 2a and B1 and DB in Experiment 2b show less change from their cor- 
responding times in Experiment 1 than (respectively ) conditions IB in Experi- 
ment 2a and BB in Experiment 2b. The exceptional point is the response time 
to the BB condition of Experiment 2a t which is 6 ms faster than in Experiment 
1 rather than being slower as it should bo. In light of the supportive evi- 
dence provided by the other conditions and, particularly, by the outcome on 
nonwords, ws ascribe the one inconsistency to sampling error or perhaps to a 
floor on response times *n the BP condition of Experiment 1. 

"We should acknowledge, however, that although the difference does not 
approach significance, irregular forms prime base forms numerically less than 
do regular forms. More generally in our research usinp repetition priming, in 
nearly all instances in which the prime and target are not identical and 
repetition priming is statistically full, it is numerically less than full. 
This is the case in most comparisons in Experiments 1-4; similar trends can be 
seen in the findings of Stanners et al. (1979) and Morton (Morton, 1981; 
Murrell & Morton, 197*0. 

5 This analysis assesses priming when subjects have no reason to attempt 
to predict a future target from a prime. It remains true, however, that by 
the time the targets are first presented, subjects have been exposed to a 
large nunfcer of morphologically-complex words. Possibly, thia promotes a 
tendency to think of morphological relatives of primes. If it does, and thus 
if the set of activated relatives can remain activated over lags of 50 items 
or mere, this finding in itself would be interesting. Moreover, it would re- 
quire an explanation in terms of activation within the lexicon, mort probably. 
Both the capacity and the temporal span of any temporary buffer would be 
exceeded by the memory demands required to activate a set of morphological 
relatives for each of the two-dozen primes presented within a 48 item span. 
In any case, research by Napps (in preparation) showing repetition priming 
with very low proportions of morphological relatives, however, suggest J that 
this cannot be a major source of repetition-priming effects. 
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Appendix A 



Experiment 1 Base Words 



enlarge* 


replace* 


yell 


gather 


knead 


pick 


call 


adjust* 


settle* 


attain* 


discern* 


laugh 


sign 


mow 


retain 


rest 


weld 


list 


gash 


govern* 


walk 


equip 


push 


pull 


punish* 


paw 


agree* 


wander 


toss 


develop* 


talk 


deploy 


enchant 


wait 


spell 


enj oy* 


roll 


latch 


command* 


manage* 


disagree* 


blink 


invent 


paint 


amend 


rook 


pronounce* 


aetach* 



*Used with both inflectional and derivational affixes. 
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Appendix B 



Experiment 2A and 2B Base Words 
Experiment 2A Base Words Experiment 2B Base Words 



enlarge 


replace 


U cV c X Up 


U I X gli u 


vol 1 

yen 


oaf Hop 


mnnaop 


soft 


knead 


pj.UK 


KV V CI II 


eager 


call 


adjust 


<-5 Q Q /-\ Q Q 

assess 


dark 


Oo f f 1 fl 


a L Ud XII 


an * nil npp 


"CQJ\ 


discern 


laugh 


ant 1 aw 

eiup xoy 


9 U XI 1 


sign 


mow 


enjoy 


V a(JUc 


reta in 


res t 


ni ml oh 

punx sn 


U vJUip 1c 


weld 


1 i of 

..lot 


ae tacn 


UX I ell 


gash 


govern 


a xsagree 


dp pi Up I X o Uc 


walk 


cLJU xp 


mo v p 


c lose 


push 


pull 


enforce 


glad 


punish 


paw 


though 


bjld 


agree 


wander 


fruit 


blind 


toss 


develop 


help 


fond 


talk 


deploy 


power 


hard 


enchant 


wait 


harm 


awkward 


spell 


enj oy 


care 


fresh 


roll 


latch 


rest 


rich 


command 


manage 


color 


like 


disagree 


jli nk 


fear 


se parate 


invent 


paint 


use 


vivid 


amend 


cook 


hope 


fair 


pronounce 


detach 


thank 


pclite 
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Appendix C 





wuru 


Tr* i a 1 q 1 1 o a H i n 


Lxpcri men i*o 


"3 anH ii 

3 ana H 








Cai j n H fin 1 v 


^nnnri anH 

O UU 1 1 u CU I (J 


Sne 1 1 i nff 




Base 


M r\ f fro 

NO uriangc 


unange 


DdO C 




Phan ap 


neai 


neaier 


rica 1 Ln 


creep 






sign 


sign ing 


signal 


oei eno 


oei enoan u 


Ho f on ^ fun 
Uc 1 CIlO IV c 


dream 


dreamer 


dreamt 


sleep 


oicc py 




edit 


eaicor 


eoit ion 


repel 


repeiien u 


pam 1 1 q { \ / o 

rep u j.3 ivc 


deal 


deal ing 


deal t 


spe ak 


speaxer 


spoke 


reside 


res ided 


residence 


oec ice 


/4aa 4 s4 a s4 

oec laea 


Hart { a { u O 


produce 


produc ible 


productive 


assume 


assumed 


as sump t ion 


confide 


confided 


confidence 


sweep 


sweeping 


swept 


in'ibit 


inhibiting 


inhibition 


invade 


invader 


invasi on 


elec trie 


elec tri cal 


elec tri c ian 


persuade 


r*\ flit') 

persuaoe 


persuaoiv 6 


Domu 


b omb er 


fx V\ ^ M A 

DOulDarQ 


space 




ana t" 1 A 1 


mean 


me an ing 


meant 


i urge t 


i urge u ui 




grade 


graa ing 


graoua te 


sing 


3 inger 


a an r» 

oaiig 


medic 


medical 


medic ine 


i an 


lolling 


1 611 


c ompare 


comparat l ve 


c ompar ao le 


induce 


inuuucuiciii 


\ nH wo t" 1 on 


ex treme 


ex tremist 


extremely 


collide 


collided 


collision 


create 


creative 


creature 


describe 


described 


description 


drive 


driver 


driven 


cone ede 


conceded 


concession 


rise 


riser 


risen 


deep 


deeply 


depth 


revise 


revising 


revision 


picture 


picturesque 


pictorial 


music 


musical 


musician 


propel 


propeller 


propulsion 


lyric 


lyrical 


lyricism 


wise 


wisely 


wisdom 


critic 


critical 


criticize 


clear 


clearly 


clarify 


clean 


cleaner 


cleanse 


forgive 


forgiveness 


forgave 
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M. Gurjanov,t G. Lukatela.t Katerina Lukatela.t M. Savi<5,t and M. T. Turveytt 



Abstrac t. Two experiments examined the effect on lexical decision 
times for inflected Serbo-Croatian nouns when the nouns were preced- 
ed by possessive adjectives (rry» your, our). For any given pairing 
the possessive adjective and the noun always agreed in number (sin- 
gular) and case (nominative) but only agreed half of the time in 
gender (masculine or feminine). Lexi^l decisions were faster when 
the noun targets were of the same gender as their primes. This 
gender congruency/incongruency effect was shown to hold whether the 
inflections of the adjective and noun were the same (as i<? the case 
for typical Serbo-Croatian nouns) or different (as is the case for 
atypical Serbo-Croatian nouns). The results are discussed in terms 
of a post-lexical influence of grammatical processing on the 
recognition of individual words- 

"Priming" is a term referring to the influence of one stimulus upon the 
processing of another. Most experiments on "priming" with word stimuli have 
considered words that are assoc iatively related. Where lexical decision la- 
tency is the measure of processing time it has been shown that processing is 
more rapid when a word is preceded by an associate compared to when it is 
preceded by a nonassooiate 'Lupker, 1984). Recently other relations between 
and among words have come under examination. Gooduian, McClelland, and Giobs 
(1981) asked whether lexical decision is speeded when successive words are in- 
stances of word types that ordinarily occur in succession in the language. 
These authors found that when two words were syntactically legal (e.g., men 
swear) the target word was responded to slightly but significantly faster than 
when the two words were syntactically illegal (e.g., whose swear). Wright and 
Garrett (1984) used fragments of sentences as the priming context,. They found 
that the grammatical structure of the incomplete sentence affected the lexical 
decision time for a target word that followed it. For example, modal verb 
contexts preceding main verb targets and preposition contexts preceding noun 



^ Journal of Experimental Psychology : Learning, Memory, and Cognition , 1985, 
U 9 692-701. 

tUniversity of Belgrade. 
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targets yielded shorter decision latencies than the contrary pairings (that 
is, modal/noun and preposition/verb). 

English uses word order as its major syntactical device. A language like 
Serbo-Croatian exploits inflection as its primary means of conveying grammati- 
cal information. Experiments on syntactic or grammatical priming in Ser- 
bo-Croatian have preserved the ordinary word-type adjacencies of the language. 
The grammatical violations have been introduced at the level of inflected 
morphemes. For example, Gurjanov, Lukatela, Moskovljevi6, Savi6, and Turvey 
(1985) paired adjectives and nouns in a lexical decision task. Grammatical 
agreement requires that the two words be of the same number, case, anC gender. 
This agreement is to be found at the level of the inflectional morphemes that 
are suffixed to the adjective and noun stems. Gurjanov et al. (1985) violated 
case agreement and found that lexical decision times for the noun targets were 
slower than when the paired words were in full agreement. In another experi- 
ment with nouns, Lukatela, Kosti6, Feldman, and Turvey (1983) observed slower 
decision times when the noun's inflection was appropriate for a preceding 
preposition than when it was inappropriate. And in an experiment with verb 
targets by Lukatela, Moraca, Stojnov, Savi6 f Katz, and Turvey (1982), lexical 
decisions were found to be faster when the preceding personal pronoun agreed 
in person than when it disagreed in person. 

How are these various instances of syntactic influences on lexical deci- 
sion to be understood? Where the context for a target word in the lexical 
6 r ision task is an associate, expediting lexical decision i3 often described 
as due to an automatic, intralexical process. This procesj is not consciously 
directed. It is simply a consequence of the way in which the lexical memory 
is organized (Collins & Loftus, 1975; Forster, 1979). The context mechanical- 
ly increases the activation level of the target's location in memory prior to 
the processing of tha target. This fast mechanical priming is generally said 
to be accompanied by a slower, attentional priming. Here the idea is that the 
context can induce a directing of the focus of attention to a particular re- 
gion of the internal lexicon (Neely, 1977; Posner & Snyder, 1975). Following 
a distinction suggested by Seidenberg, Tanenhaus, Leiman, and Bienkowski 
(1982), contexts that include an associate or semantic relative and that al- 
low, in principle, the foregoing priming processes are termed "priming con- 
texts." A priming context contrasts with the context under investigation in 
the present paper, namely, a minimal grammatical context. A context of this 
latter type, referred to as "nonpriming" by Seidenberg et al. (1982), does not 
appear to precipitate automatic spreading activation (Lukatela et al., 1982). 
The difference in lexical decision times that accompanies the syntactic oon- 
gruency/syntactic incongruency contrast seems tr> be due to post-lexical pro- 
cesses rather than lexical processes (Seidenberg, Waters, Sanders, & Langer, 
1984). The important point to be underscored is that lexical decision is a 
complex operation. The accessing of the context's and of the target's 
representations in the internal lexicon is but one component procesr. Other 
processes might include (1) recognizing the grammatical relation between con- 
text and target and (2) assigning a meaning to the context-target structure 
(cf. deGroot, Thomassen, & Hudson, 1982; Forster, 1979, '982; West & Stano- 
vich, 1982). If these post-lexical processes ai*e completed before the inter- 
nal deadline for emitting a lexical decision, they may influence positively 
(to shorten) or negatively (to lengthen) the response latency (West & Stano- 
vich, 1982). 



206 

ERLC 



209 



Gurjanov et al: Grammatical Priming of Inflected Nouns 



The present experiments extend the abovementioned studies on the 
grammatical priming of nouns. They examine the situation in which nouns agree 
or disagree in gender with the preceding word, a possessive adjective (in En- 
glish, my, yoiu ur, etc.). They also examine the sensitivity of the 
nominative singular case to priming. The preposition priming study of Lukate- 
la et al. (1983) did not address this issue directly because the nominative 
singular case of Serbo-Croatian noun is not governed by a preposition. The 
study by Gurjanov et al. (1985) did address tnis issue directly and yielded a 
negative result: decision times for nouns in the nominative singular case 
were unaffected by case agreement with preceding adjectives. This issue of 
the priming sensitivity of the nominative singular case of nouns is important 
given the demonstration that this case plays a central role in the organiza- 
tion of the inflected forms of a noun in the internal lexicon (Lukatela, 
Gligori jevi6, Kosti6, & Turvey, 1980). Although the various cases occur with 
different frequencies, the evidence suggests that speed of lexical access is 
indifferent to case frequency. The nominative singular is accessed fastest 
with the different oblique cases accessed at roughly the same speed. 

The question posed is whether the privileged lexical status of the 
nominative singular is associated with a general insensi tivity to grammatic 
context. Is it possible that case agreement and gender agreement are not of 
equal s v Mf icance? If they are not then failure to find an effect of agree- 
ment in case (Gurjanov et al., 1985) may not extend to agreement in gender. 
To anticipate, the experimental outcome is that gender agreement does affect 
the processing of nouns in the nominative singular. 

Experiment 1 

The lexical decision time for any given target noun in the nominative 
singular form was measured in two contexts — one in which it was preceded by a 
possessive adjective in the nominative singular form and one in which it was 
preceded by a visually similar pseudopossessi ve adjective. For one half of 
the noun targets the po3sessive adjective agreed in gender. It was expected 
that if g wtier agreement influenced the processing of nominative singular noun 
forms, then gender agreement would result in faster decisions than gender 
disagreement. 

The majority of Serbo-Croatian masculine nouns in the nominative singular 
case end in a consonant. In comparison, the majority of feminine nouns in the 
nominative singular end in A and the majority of neuter nouns end in either 0 
or E. Some masculine nouns in the nominative singular, however, end in A. 
There are some feminine nouns in the nominative singular that end in a conso- 
nant. In the first experiment only typical masculine and feminine nouns were 
used. (In the second experiment both the typical and atypical types are exam- 
ined.) 

Method 

Subjects . Nineteen students from the Department of Psychology, Universi- 
ty of Belgrade, received academic credit for participation in the experiment. 

Materials . Letter strings of upper ca;,e letters were typed with an IBM 
Selectric Typewriter. The letter strings were used to prepare black on white 
slides. 
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Two types of slides were constructed. In one type, the letter string was 
arranged horizontally in the upper half of a 35 mm slide and, in the other 
type, letters of the same kind were arranged horizontally in the lower half of 
a 35 mm slide. Letter strings in the first type of slides were always 
possessive adjectives in nominative singular form (or their pseudo-word 
analogues), and letter strings in the second type of slide were always ordi- 
nary nouns in nominative singular form (or their pseudoword analogues). 
Altogether, there were 1*14 "possessive adjective" stimuli and "noun" sti- 
muli with each set evenly divided into words and pseudowords. 

The 36 nouns were selected from the middle frequency range of a corpus of 
one million Serbo-Croatian words (Kosti6, 1965). Half of the nouns were 
masculine and half of the nouns were feminine. A different set of 36 nouns 
(18 masculine and 18 feminine) of the same frequency was used to generate the 
pseudorouns. This was done by simply changing one letter in the root 
morpheme. The replacement was an orthotactically and phonotact ically legal 
letter. Importantly, all "nouns" (words and pseudowords) were five letters in 
length and consisted of two syllables. Thirty-six possessive adjective stimu- 
li were possessive adjectives in the nominative singular form of the masculine 
gender: twelve were the first person singular (MOJ - my); twelve were the 
second person singular (TVOJ « thy); and twelve were the first person plural 
(NAS » our). Thft other 36 possessive adjective stimuli were the same 
possessive adjectives in the same case and in the same proportion but of the 
feminine gender (MOJA, TVOJA, and NASA). In addition to these 72 possessive 
adjective stimuli another 72 "possessive adjective" stimuli were constructed 
with the pseudoword analogues of the three masculine and feminine possessive 
adjectives, namely, ME J, TLOJ, LAS, MEJA, TLOJA, LASA. 

In total, a subject was presented 1*1*1 pairs of stimuli in the experimen- 
tal session. Sixteen other different pairs of stimuli were used for the 
preliminary training of subjects. 

Design . Each noun was presented two times to a given subject. On the 
two occasions a noun was presented, it was preceded by a possessive adjective 
on one occasion and by a psclopossessi ve adjective on the other occasion. 
Importantly, between the first and second presentation of a given noun there 
were always 71 presentations of other pairs. This constraint on the design of 
the experiment meant that the 36 nouns and the 36 pseudonouns that were ex- 
posed in a pseudorandom order in the first half of each experimental session 
were exposed in the same order in the second half of the session. However, 
the priming stimuli in the first and second ha]f of the session were mutually 
interchanged. Those nouns and pseudonouns, which in the first half of the 
session were preceded by possessive adjectives, were preceded in the second 
half by the corresponding pseudopossessive adjectives, and vice versa. Hence, 
a given subject never experienced a given pair of stimuli more than once. 

As noted, for any given subject a target noun appeared only twice with 
one appearance preceded by a pseudopossessive adjective. The other appearance 
was preceded by a possessive adjective. The possessive adjective context 
could either agree or disagree in gender with the noun. That is, if the noun 
were masculine, then the preceding possessive adjective could be either mascu- 
line or feminine. Consequently, for a given sutject, the nouns that occurred 
in an appropriate possessive adjective context were different from the nouns 
that occurred in an inappropriate possessive adjective context. In summariz- 
ing the data in Table 1 the fact that different word sets comprised the appro- 
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priate and inappropriate pairings is marked by the use of two exemplary mascu- 
line nouns, LONAC and SAPUN, and two exemplary feminine nouns, TABLA and PTI- 
Ca. There is a further feature of the desi^i to be remarked upon. If a tar- 
get noun, say, LONAC, was preceded by a possessive adjective of the proper 
gender, say, MOJ, on one of its appearances, then it was preceded by a visual- 
ly similar pseudopossessi ve adjective, say, MEJ, on the other appearance. 
Similarly if LONAC was preceded by the inappropriate context MOJA on one 
appearance, it was preceded by the pseudopossessive adjective MEJA on the oth- 
er. The design therefore permitted the direct comparison within a subject of 
lexical decision times to the same word in two different contexts — one in 
which the prime agreed or disagreed grammatically and one in which the prime 
was a pseudoword. 

To reiterate, a given subject saw 144 different pairs of stimuli: one 
quarter of the trials consisted of possessive adjective-noun pairs (half 
of which agreed and half of which disagreed in gender), one quarter consisted 
of pseudo possessive adjective-noun pairs, one quarter consisted of possessive 
adjective-pseudonoun pairs, and one quarcer consisted of pseudopossessive 
adjecti ves-pseudonoun pairs. The presentation order was pseudorandom. 

Procedure . On each trial, two slides were presented. Ths subjects 1 task 
was to decide as rapidly as possible whether the letter string contained in a 
slide was a word. Each slide was exposed in one channel of a three-channel 
tachistoscope (Scientific prototype model GB) illuminated at 10.3 cd/m 2 . Both 
hands were used in responding to the stimuli. Both thumbs were placed on a 
telegraph key close to the subject and both forefingers on another telegraph 
key two inches further away. The closer key was depressed for a "no" response 
(the string of letters was not a word); and the farther key was depressed for 
a "yes" response (the string of letters was a word). 

Latency was measured from the onset of a slide. The subject's response 
to the first slide terminated its duration and initiated the second slide (at 
effectively a delay of 0 ms) unless the latency exceeded 1300 ms in which case 
the second slide was initiated automatically. The duration of the second 
slide, unlike that of the first, was fixed at 1300 ms. 



A mean reaction time was computed for each subject on each type of word 
pair. Latencies snorter than 300 ms and longer than 1300 ms were excluded as 
were latencies associated with incorrect responses. The total exclusions did 
not exceed 1.4 percent of all responses. The mean latencies for the primes, 
namely, masculine possessive adjective (e.g., MOJ), feminine possessive 
adjective (e.g., MOJA), pseudo masculine possessive adjective (e.g., MEJ), and 
pseudo feminine possessive adjective (e.g., MEJA) were: 5^2 ms, 5^3 ms, 638 
ms, and 637 ms, respectively. 

Because of the design of the experiment, a subject saw any given mascu- 
line noun in the nominative singular, for example, LONAC, preceded once by a 
masculine possessive adjective in nominative singular, for example, MOJ, and 
preceded once by a mutated version of that same masculine possessive 
adjective, viz., MEJ. Likewise, the subject saw any given feminine noun in 
the nominative singular, for example, PTICA, preceded once by MOJA and once by 
MEJA. The same arrangement was true for the incongruent pairings: MOJA SAPUN 
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with ME J A SAPUN for a masculine noun, and MOJ TABLA with MEJ TABLA for a 
feminine noun. These relations and comparisons are displayed in Table 1. 



Table 1 

Lexical Decision Times for Examples of Masculine and Feminine Nouns Primed by 
Real and Pseudopossessive Adjectives 





Noun 


gender 


Type of prime Prime inflection 


masculine 


feminine 


Masculine (6) 


608±41 3 
( LONAC ) b 


665±39 
(PTICA) 


possessive adjective 

Feminine (A 


672±27 
(SAPUN) 


593±36 
(TABLA) 


Masculine (6) 


653±10 
(LONAC) 


6H0±36 
(PTICA) 


pseudoadjective possessive 

Feminine (A) 


623±42 
(SAPUN) 


611±27 
(TABLA) 



mean reaction time and standard deviation 
example of noun 



Only effects that were significant by both the analysis based on subject 
means and the analysis based on item means are reported. The question of ma- 
jor interest is whether lexical decision times were affected by the grammati- 
cal relation between the prime and the target. This effect, if it exists, 
should be found in the two-way interaction between target gender and prime 
inflection ano the three-way interaction among target gender, prime inflec- 
tion, and lexicality. Both interactions proved to be significant: F(1,18) - 
7^.93, MSe - 641, p < .001 and F(1,18) « 52. 'o, MSe - 877, £ < .001 by the 
subject analysis; and F(1,32) - 17.18, MSe - 19220, £ < .001 and F(1,32) - 
15.68, MSe - 21794, £ < .001 by the item analysis. Also significant was the 
main effect of prime inflection: F(1,l8) - 19.79, MSe - 291 , £ < .001 and 
F(1,32) - 4.10, MSe « 4591 , £ < .05 by the subjects and items analyses, 
respectively. On the average, lexical decisions following the uninflected 
primes (e.g., MOJ, MEJ) were slower than those following the inflected primes 
(e.g., MOJA, MEJA): 642 ms versus 625 ms. 

The anal/sis supports the hypothesis that lexical decision on a noun in 
the minimal grammatical context provided by a possessive adjective depends on 
whether or not the noun and possessive adjective agree in gender. For mascu- 
line nouns the difference between the inappropriate pairing and the appropri- 
ate pairing was 64 ms; for feminine nouns it was 72 ms. These magnitudes are 
considerably larger than the inappropriate-appropriate difference reported by 
Goodman et al. (1983). Comparisons of English word sequences such as "men 
swears" (appropriate) and "whose swears" (inappropriate) yielded small differ- 
ences of 19 ms (Experiment 1) and 13 (Experiment 2). 
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The grammatical cc -igruent-gramniatical incongruent contrast is a reliable 
measure of grammatical priming. Less reliable but of larger theoretical im- 
portance is the measure of gramrjatical priming that divides che congruency ef- 
fect into faoilitative and inhibitory components. This division rests on the 
availability of a suitable baselir-e. In the present experiment nouns follow- 
ing pseudowords provide the baseline. What is missing, however, is an inde- 
pendent evaluation of the affect of pseudowords on lexical decision. Another 
weakness of the current baseline is that a pseudopossessive adjective-noun se- 
quence involves a negative response followed by a positive response, raising 
the possibility of an inhibitory influence on the noun decision-making proc- 
ess. The analysis that follows should be interpreted with these caveats in 
mind. 

As noted above, because of the design of the experiment it i3 possible to 
make a within-subject comparison of a noun with itself in two different con- 
texts, namely, those of possessive adjective and pseudopossessive adjective. 
Facilitation of lexical decision is here defined operationally by a signif- 
icant positive difference between pairs of type MOJ LONAC (congruent prime) 
and MEJ LONAC (nonsense prime) or MOJA PTICA (congruent prime) and MEJA PTICA 
(nonsense prime), and inhibition of lexical decision is defined by a signif- 
icant negative difference between pairs of type MOJA SAPUN (incongruent prime) 
and MEJA SAPUN (nonsense prime) or MOJ TABLE (incongruent prime) and MEJ TABLA 
(nonsense prime). Protected t-tests (Cohen & Cohen, 1975; the error term from 
the ANOVA is used as the estimate of the variance) on subject means revealed 
that there was facilitati : t(lO) « 4.79, £ < .001 and t(l8) - 2.29, £ < .05 
for the masculine (LONAC) and feminine (PTICA) situations respectively; and 
that there was inhibition: t ( 1 8 ) - 4.^, £ < .001 and t(l8) - 2.50, £ < - 05 
for the masculine (SAPUN) "and feminine (TABLA) situations, respectively. 
These outcomes were nearly corroborated in full by protected t-tests on item 
means: t(32) - 3.^9, £ < .001 and t(32) - 1 .75, £ < .05 for the masculine 
(LONAC) and feminine (PTICA) situations, respectively; t(32) » 3.72, £ < .001 
and t(32) « 1.59, £ > .05 for the masculine (SAPUN)~and feminine (TABLA) 
situations, respectively. 

An ANOVA conducted on the pseudonoun data revealed no main effects or 
interactions. 

Experiment 2 

The inflectional morphemes of a masculine possessive adjective in 
nominative singular and a typical masculine noun In nominative singular are 
identical, viz., 6. Similarly, the inflectional morphemes of a feminine 
possessive adjective in nominative singular and a typical feminine noun in 
nominative singular are identical, viz., A. The second experiment examines 
the contribution of this identity in inflectional morphemes to the gender con- 
gruency /incongruenc-y effect observed in Experiment 1. 

As noted above, there are (very few) masculine nouns that end in A in the 
nominative singular and (relatively more) feminine nouns that end in 6 in the 
nominative singular. It is possible, therefore, to have a possessive 
adjective and noun that agree in nominative singular case and in gender but 
that do not share the same inflected ending, for example, MOJ DEDA (my grand- 
fa' her), where both words are masculine nominative sing, lar, and MOJA MATE ft 
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(my mother), where both words are feminine nominative singular. The second 
experiment exploits pairs of the preceding kind along with pairs constructed, 
as before, from typical masculine and feminine nouns, for example, MOJ LONAC 
and MOJA PTICA. if the gender congr uency/incongr uency effect is not tied to 
the visual or linguistic identity of the prime and target suffixes, then the 
effect should hold for possessive adjective-noun pairs constructed with atypi- 
cal nouns as it does for such pairs constructed with typical nouns. If MOJ 
LONAC is faster than MOJA LONAC, then MOJ DEDA should be faster than MOJA DE- 
DA. The latter observation would rule out the hypothesis that the effect ob- 
tained in the first experiment was due to dimensions of visual similarity 
rather than grammatical similarity. 

The design of the second experiment differed from that of the first. In 
the second experiment, unlike the f*rst, no noun or pseudonoun target was re- 
peated in the sequence of prime- target pairs seen by a subject. In the second 
experiment, unlike the first, the nouns preceded by congruent possessive 
adjectives were also the nouns preceded by incongruent possessive adjectives. 
This was achieved by a between-subjects manipulation. Where one group of sub- 
jects saw a given noun preceded by a grammatically appropriate prime, another 
group of subjects saw the same noun preceded by a grammatically inappropriate 
prime. The analysis of the experiment focuses on the grammatical congr uen- 
cy/grammacical incongruency effect. What few merits the analysis into facili- 
tation and inhibition effects might have had in the first experiment, given 
its within-subject comparison of a target noun preceded by a word prime and a 
pseudoword prime, were reduced further by the between-subject design of the 
second experiment. Consequently, no attempts were made in the second experi- 
ment to quantify facilitation and inhibition. 



Subjec ts. Fifty-two students from the Department of Psychology, Univer- 
sity of Belgrade, received academic credit for participation in the experi- 
ment. A subject was assigned to one of four subgroups according to the sub- 
jects' appearance at thw Laboratory, for a total of thirteen subjects per 
subgroup. None of the subjects had participated in Experiment 1. 

Materials . The stimuli were of the same physical appearance as in 
Experiment 1. Altogether, 128 "possessive adjective" stimuli and 128 "noun" 
stimuli were constructed, with each set evenly divided into words and pseudo- 
words. The 64 real possessive adjective stimuli represented the possessive 
adjectives MOJ, MOJA, (my) and TVOJ, TVOJA (your). The 64 pseudopossessive 
adjective stimuli were derived from the possessive adjectives by replacement 
of a consonant or a vowel (ME J , MEJA, MOS, MOSA, FOJ , FOJA, KVOJ , KVOJA, TVOK, 
TVOKA, TVEJ, TVEJA). 

Thirty-two of the nouns in Experiment 2 were similar to those used in 
Experiment 1 — there were 16 typical masculine nouns and 16 typical feminine 
nouns. In comparison to Experiment 1 an additional set of 32 atypical nouns 
was also used: 16 masculine nouns ending in the vowel A and 16 feminine nouns 
ending in a consonant. The 64 pseudonouns were generated from these typical 
and atypical nouns by replacing the initial or middle consonant by another 
consonant of same phonemic class. Consequently, 32 pseudonouns ended in a 
consonant and 32 pseudonouns ended in A. 
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In total, there were 512 different pairs of stimuli of which a given sub- 
ject saw 128 pairs. Thirty-two other pairs of stimuli were used for the 
preliminary training of subjects. 

Design . The constraint of the design of the experiment was that a given 
subject never experienced a given noun or pseudonoun more than once. 

As mentioned, a given subject saw 128 different pairs of stimuli. Each 
subject saw the same nouns and pseudonouns as every other subject but not 
preceded by the same possessive adjective or pseudopossessi ve adjective type. 
Consider, for example, the masculine noun LONAC. In one group of subjects 
this noun was preceded by a possessive adjective in the same case, number, and 
gender (e.g., MOJ); in a second group it was preceded by a possessive 
adjective of the same case and number but of a different gender (e.g., MOJA); 
in a third group it was preceded by a pseudoword visually similar to the con- 
gruent prime (e.g., MEJ or MOJ or FOJ); and in a fourth group it was preceded 
by a pseudword visually similar to the incongruent prime (e.g., MEJA or MOJA 
or FOJA). In one half of the 128 trials the second stimulus in a pair was a 
noun, and in the other half the second stimulus was a pseudonoun. In one half 
of the 32 possessive adjective-noun trials a given subject saw 8 typical 
masculine and 8 typical feminine nouns. There was a similar division for the 
32 pseudopossessive adjective-noun trials, the 32 possessive adjective-pseudo- 
noun trials, and the 32 pseudopossessive adjective-pseudonoun trials. Within 
each combination gender-congruent possessive adjectives and gender-incongruent 
possessive adjectives appeared equally often. 

Procedure . The procedure was the same as in Experiment 1. 

Results 

A mean reaction time was computed for each subject in each of the four 
groups. The criteria for excluding responses were the same as in Experiment 
1. Approximately 3.5 percent of all responses were excluded from the analyses 
by these criteria. 

The first question to be addressed is whether the results of the first 
experiment which were obtained with typical masculine and feminine nouns were 
replicated in the second experiment. Table 2 presents the data for typical 
masculine and feminine nouns as a function of prime lexicality and prime 
inflection. A group x prime lexicality x target gender x prime inflection 
analysis of variance suggests that the outcome of Experiment 2 was very simi- 
lar to that of Experiment 1: Target gender was significant, F ( 1 ,48) =■ 15.69, 
MSe » 2610, p < .001; target gender by prime inflection was significant, 
F(1 ,48) - 20.53, MSe « 4534, p < .001; and target gender by prime inflection 
by prime lexicality was significant, F( 1 , 48) - 30.47, MSe * 2232, p < .001. 
Although the main effect of groups was not significant, there were significant 
interactions involving groups: group by prime inflection, F(3,48) * 13.66, 
MSe = 2222, p < .001; group by prime lexicality, F(3,48) » 5.57, MSe « 5670, 
£< .01; group by prime inflection by prime lexicality, F(3,48) * 11-30, 
MSe = 1958, p < .001; and the four way interaction. These interactions 
identify the differences in the pairs of stimuli assigned to the groups. 

As with Experiment 1 it can be claimed that lexical decision times for 
target nouns of the typical type depended on whether the inflected ending of 
the prime was consistent with the gender of the noun. This dependency is 
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Table 2 

Lexical Decision Times and Error Rates for Typical Masculine and Feminine 
Nouns as a Function of Prime Lexicality and Prime Inflection 





( t* vn1 nal 1 
V L. y pi al ) 


Mmin Cfpnrlpp 


Type of prime Prime inflection 


masculine (6) 


feminine (A) 


Masculine (6) 


657±93 a 


687±92 




1 .4 b 


2.4 


possessive adjective 






Feminine (A) 


71 7±1 1 2 


636±79 




4.8 


0.50 


Masculine (6) 


670±84 


661 ±91 




5.8 


1.4 


pseudo adjective possessive 






Feminine (A) 


666+80 


647±73 




4.3 


1.9 


^mean reaction time and standard deviation 




percentage of responses that were incorrect 




Table 


3 




Lexical Decision Times and Error Rates 


for Atypical Masculine and 


Nouns as a Function of Prime Lexicality and Prime Inflection 




(atypical) Noun gender 


Type of prime Prime inflection 


masculine (A) 


feminine (6) 


Masculine (6) 


71 2+1 07 a 


692 ±108 




5.8 5 


4.8 


possessive adjective 






Feminine (A) 


734±102 


647±86 




7.7 


1.4 


Masculine (6) 


723±1 04 


675±75 




8.2 


5.3 


pseudo possessive adjective 






Feminine (A) 


730±99 


652±69 




8.7 


2.4 



fmean reaction time and standard deviation 
percentage of responses that were incorrect 
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greater for word-word pairs than for pseudoword-word pairs. Protected t-tests 
confirmed the difference between congruent word-word pairs and incongruent 
word-word pairs for the masculine nouns, t(48) • 6.49, p < .001 and between 
congruent word-word pairs and incongruent word-word pairs for the feminine 
nouns, t(48) = 5.52, p < .001. However, neither the masculine nor the 
feminine comparison was significant for the pseudoword-word pairs. 

Is the gender congruency/incongruency effect exhibited by possessive 
adjective-noun pairs constructed with atypical nouns? Table 3 presents the 
data for the atypical masculine and feminine nouns as a function of prime 
lexicality and prime inflection. Comparison of Table 3 with Table 2 suggests 
a 3imilar, though not identical, pattern of result3. An analysis of variance 
conducted over the combinations of groups, prime lexicality, target gender, 
and prime inflection yielded significant effects for target gender, F(1,48) » 
99.87, MSe = 3495, £ < .001 and for the interaction of target gender with 
prime inflection, F( 1 f 48) » 21.68, MSe » 2869, p < .001. There was no main 
effect of groups but all the interactions with group were significant, as 
above. Like typical nouns, atypical nouns exhibit a gender congruency/incon- 
gruency effect but, unlike typical nouns, the magnitude of the e/fect is less 
dependent on the lexicality of the prime. 

It is noteworthy that there was a large difference in errors between 
atypical masculine nouns (mo~e) and atypical feminine nouns (less), F(1,48) * 
11.92, p < .001 and that the errors committed on these two noun types depended 
differently on the inflection of the preceding prime, F(1,48) = 4.44, p < .05. 
The same analysis on the typical nouns revealed that the masculine nouns were 
again the source of most errors, F(1,48) » 7.65, £ < .01, but that there was 
no interaction of target gender with prime inflection. Overall, the errors 
for both analyses follow the pattern of the decision latencies (compare Tables 
2 and 3) but it is not obvious why, in all analyses (Experiment 1 and Experi- 
ment 2), latencies are longer on average and errors are greater on average for 
masculine nouns. 

The third question is whether the gender congruency/incongruency effect 
differs between typical and atypical masculine nouns. The number of masculine 
nouns that end in A is very small, as noted, and the number of nouns in this 
category used in the experiment almost exhausts the category. By and large, 
masculine nouns inflected with A in the nominative singular occur less 
frequently than masculine nouns inflected with 6 in the nominative singular. 
A group x prime lexicality x prime inflection x target inflection (typical 
vs. atypical type) analysis of variance wa3 conducted. The main effect of 
prime inflection was significant, F ( 1 , 48 ) » 4.99, MSe - 9249, 
p < .05 — 6-inflected primes were associated with faster lexical decisions (691 
ms) than A-inflected primes (711 ms). The difference between typical and 
atypical nouns was significant, F(1 ,48) * 83.39, MSe = 2768, jp < .001; the 
atypical nouns were responded to more slowly (723 ms) than the typical nouns 
(680 ms) probably because of their lower frequency of occurrence. The 
interaction of prime lexicality and prime inflection was significant, 
F(1,48) - 4.28, MSe - 9822, £ < .05 as was the interaction of prime lexicality 
and target inflection, F(1,48) » 5.97, MSe » 2145, £< .01. There was no 
two-way interaction between inflection of the prime and the typicality of the 
inflection of tne noun. Lexical decision times for typical masculine nouns 
preceded by the congruent 6" inflected primes (real and pseudo ) were 33 ms 
shorter, on the average, than lexical decision times for typical masculine 
nouns preceded by incongruent A-infleeted primes (real and pseudo). This 
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average difference for atypical masculine nouns was 15 ms. There was, howev- 
er, a significant thrae-way interaction among prime lexicality, prime inflec- 
tion, and target inflection (typical vs. atypical), F(1,15) -5.06, 
MSe * 3193, £ < .05. Inspection of Tables 2 and 3 reveals that the inflection 
of che p^audoadjective prime did not matter for either typical or atypical 
nouns. The congruency/incongruency difference was -J* ms and -7 ms, 
respectively. In contrast, the inflection of the adjective prime did matter 
for both typical nouns and atypical nouns and it mattered more for the typical 
nouns than the atypical nouns. V. e congruency-incongruency difference was 60 
ms and 22 ms, respectively. In sum, the data suggest that the magnitude of 
the gender congruency/incongruency effect differed between typical and atypi- 
cal masculine nouns. 

The fourth question addressed parallels the third. Does the gender con- 
gruency/incongruency effect differ between typical and atypical feminine 
nouns? The answer in this case is negative. A group x prime lexicality x 
prime inflection x target inflection (typical vs. atypical) revealed only one 
significant effect, namely, the mair effect of prime inflection, 
F(1,il8) - 17.30, MSe - 6675, £ < .001 ; A-inflected primes were associated with 
faster lexical decision (648 ms) than 6- inflected primes (678 ms) as ought to 
be the case for feminine noun targets. 

Finally, with respect to thf pseudonoun data, separate analyses of vari- 
ance revealed that for both the typical and atypical cases there was a signif- 
icant effect of target inflection (6 vs. A): F(1,51) - 6.54, MSe - 3050, 
£ < .01 and F(1,51; - 4.77, MSe - 4290, £ < .05, respectively. Pseudonouns 
ending in A were rejected more slowly. A further significant effect was ob- 
served in the atypical analysis, namely, tnc interaction of prime lexicality 
and target inflection, Fi *.|) - 18.90, MSe - 2827, £< .001. Where 
<J- inflected atypical pseudonouns were responded to faster when preceded by a 
pseudo-possessive adjective, A-inflected atypical pseudonouns were responded 
to faster when preceded by a possessive adjective. The data equivocate on 
whether or not rejecting pseudonouns was made more difficult by a grammatical- 
ly and lexically proper context. 



In the present experiments, possessive adjectives provide a minimal 
grammatical context for nouns in the nominative singular. With case and num- 
ber held constant it is shown that when the two words agree in gender, lexical 
decision on the tarr t noun is faster tkian when the two words disagree in 
gender. A previous experiment (Gurjanov et al., 1985) found no effect of case 
congruency on the processing of nouns in the nominative singular. That gender 
congruency does affect the processing oT nominative singulars may have impli- 
cations for the representation of inflected nouns in the internal lexicon 
(Lukatela et al., 1980). 

The lesson learned from Experiment 2 is that the gender congruency/ incc - 
gruency effect is not mediated by visual identity or phonemic identity of the 
morpnemes that inflect the possessive adjective and the noun. This latter 
observation implies that the gender congruency/incongruency effect must in- 
volve the recognition of the genders of the possessive adjective and the noun, 
which implies, in turn, that gender is part of a word's representation in the 
lexicon. It is not presumptuous to assume that one's knowledge of words in- 
cludes a knowledge of the grammatical arrangements into which they may enter. 
:i6 
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To know that the feminine possessive adjective MOJA cannot be entered into a 
grammatical arrangement with the masculine nouns LONAC or DEDA is to know that 
MOJA and LONAC or MOJA an^ DEDA are of unlike gender. On the other hand, to 
know that the masculine possessive adjective MOJ car. be linked to the mascu- 
line nouns LONAC and DEDA is to know that these words are alike in case, num- 
ber, and gender. 

The argument that there is a syntactical/grammatical processor is an 
argument for a device separate from the device that accesses lexical represen- 
tations and separate from the device that assigns meaning to an arrangement of 
words (cf . Forster , 1979) . The syntactic/ grammatical processor assigns a 
syntactical structure or a grammatical relation to a context- target arrange- 
ment. It obviously has a degree of autonomy; there are many celebrated exam- 
ples of English syntactical structure being assignable to a list of nonsense 
letter strings. However, with respect to the question of the information with 
which the syntactic or grammatical process works, it must be supposed that 
t ^at information is derived in large part by the lexical processor. Seiden- 
berg et al. (1982) showed that in English lexWl priming contexts, facilita- 
tion effects are not indifferent to the grammatical function of words and 
argue for a model of the internal lexicon enriched by syntactical details— an 
argument consonant with the suggestions of Kaplan and Bresnan (1932) and 
Gazdar (1982) in theoretical linguistics and continuous with the experimental 
efforts of Huttenlocher and Lui (1979) and Miller and Johnson-Laird (1976) and 
others to distinguish the mental representations of different word classes. 

Given the notions of lexical processor, grammatical processor and message 
processor (Forster, 1979) as three relatively independent systems underlying 
lexical decision, an account of the gender congruency/ incongruency effect 
takes the following form (after West & Stanovich, 1982). When a grammatically 
congruent pair (e.g., MOJ LANAC, MOJ DEDA, MOJA PTICA, or MOJA MATER) is 
presented, the outputs from the lexical processor, grammatical processor and 
message processor are all positive— the ideal situation for a subsequent deci- 
sion-making mechanism that muse arrive at the appropriate response "yes." How- 
ever, when a grammatically incongruent pair (e.g., MOJA LONAC, MOJA DEDA, MOJ 
PTICA, or MOJ MATER) is presented, the output from the lexical processor is 
positive and so, perhaps, is the output from the message processor, but the 
output from the grammatical processor is negative. The information made 
available to the grammatical processor from the lexical processor is that the 
context is one gender and the target is another gender. Consequently, the 
situation for the decision-making system is less than ideal; there are 
discrepancies in the outputs and the jno bias from the grammatical processor 
must be overcome (West & Stanovich, 1982). As a result, lexical decision to a 
grammatically incongruent pair (e.g., MOJA LONAC) is slower than lexical deci- 
sion to a grammatically congruent pair (e.g., MOJ LONAC). 

The foregoing account is sufficiently general to accommodate the 
syntactic or grammatical priming effects found with English language materials 
(Goodman et al., 1981; Wright & Garrett, 1 984 ) and those found with Ser- 
bo-Croatian language materials. Where the account is weak is in its failure 
to distinguish those components of fammatical processing that are automatic 
or reflexive (Fodor, 1983; Wright & Garrett, 1984) from tho-j that are merely 
strategic, that is, those that are "conscious-attentive" and shaped by the 
conditions of the experiment. This failure is due in part uo the lack of data 
relevant to the contrast. It has been estaolished empirically that 
associative priming involves components of boti. kinds and the theory of 
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associative priming ably recognizes the distinction (Neely, 1977). If syntac- 
tic or grammatical priming proves to depend similarly on a fast-acting 
automatic process and a slow-acting conscious-attentive process, then this 
much seems certain: In syntactic or grammatical priming both of these proces- 
ses are post-lexical (Gurjanov et al. f 1985; Seidenberg et al. f 198*1; West & 
Stanovich, 1 982 ) . 
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GRAMMATICAL PRIMING OF INFLECTED NOUNS BY INFLECTED ADJECTIVES* 



M. Gurjanov,t G. Lukatela,t Ja3tnina Moskovljevi<5,l M. Savi<5,t 
and M. T. Turveytt 



Abstract , Two experiments are reported in which subjects made rapid 
lexical decisions about inflected nouns preceded by inflected 
adjectives or pseudoadjectives that did or did not agree grammati- 
cally. Both adjectives and pseudoadjectives were shown to affect 
lexical decision times for nouns, suggesting that the priming of 
inflected nouns by inflected adjectives occurred at the level of the 
inflections. Inflected pseudonouns, however, were not affected sim- 
ilarly, suggesting that lexical factors were contributing to the 
priming in addition to grammatical factors. This instance of 
grammatical priming is described as an effect that arises post-lexi- 
cally, based on the outcomes of relatively independent lexical and 
syntactical processors. 

Two broad questions may be raised with regard to the processing of nouns 
in an inflected language: (1) How are the cases of a noun organized with re- 
gard to each other in the internal lexicon?; and (2) How are inflected nouns 
linked to other lexical types such as prepositions and inflected adjectives? 
Serbo-Croatian is an inflected language in which the noun takes a gender 
(masculine, feminine, or neuter) and is declined in seven forms (nominative, 
accusative, instrumental, genitive, dative, locative, vocative), both in the 
singular and the plural. The fourteen inflected forms of a Serbo-Croatian 
noun can be viewed as forming a noun system (Lukatela, Gligori jevi<5, Kosti<5, & 
Turvey, 1980). Ordinarily an inflected Serbo-Croatian noun in a sentence is 
grammatically related to a preposition and to one or more adjectives. Al- 
though they are not declined, prepositions are specific to inflected noun end- 
ings. A given preposition goes with at least one noun case, sometimes several 
cases but never with all noun cases. Adjectives are declined but not 
necessarily with the same inflected endings as nouns. When qualifying a noun, 
however, the inflection of the adjective and the inflection of the noun must 
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agree grammatically (for example, if the noun is masculine and in the singular 
accusative form, the adjective must be masculine and in the singular 
accusative form). 

With respect to the first question raised above on the organization of 
the cases there is evidence to suggest that frequency — precisely, the frequen- 
cies with which the various inflected noun forms occur in ordinary language 
usage — is not a major determinant of a Serbo-Croatian noun system's organiza- 
tion. In a lexical decision task nouns in the nominative singular form were 
accepted as words faster than nouns in the oblique forms. Among the oblique 
forms, however, decision times did not differ despite marked differences among 
the oblique forms in their respective frequencies of occurrence (Lukatela et 
al., 1978; Lukatela et al., 1980). Apparently, the nominative and oblique 
forms are qualitatively distinguished in the organization of a noun system 
with the nominative assuming a pivotal role. However, in either an oblique or 
nominative form a noun appears to be represented in the lexicon as a single 
unit corresponding to the complete word rather than as a combination of dis- 
tinct units corresponding to morphemic constituents. The stems and suffixes 
of Serbo-Croatian nouns do not appear to be stored separately. An observation 
of the unitary representation of nouns, however, does not rule out the possi- 
bility that noun representations indicate their stem/suffix structure 
(Stanners, Neiser, & Painton, 1979; Taft & Forster, 1975). 

With respect to the second question raised above (on the processing rela- 
tion of nouns to other lexical types), it has been shown that with Serbo-Croa- 
tian words a preposition preceding a noun case with which it is grammatically 
consistent speeds up lexical decision on the noun. However, lexical decision 
on an inflected noun form that is grammatically inconsistent with the preced- 
ing preposition is not appreciably slowed (Lukatela, Kosti6 f Feldman, & Tur- 
vey, 1983). Facilitation (and inhibition) effects among words are often ex- 
plained (but not always, see Discussion) by a notion of activation spreading 
out from one excited region of the lexicon to neighboring regions and/or by a 
notion of a directing of attention to a specified region of the lexicon. The 
first of these mechanisms may be suited to semantic relations among lexical 
entries but it is not easily generalized to grammatical relations such as be- 
tween members of a closed class like prepositions and an open class like nouns 
(and it is not easily generalized to semantic relations in natural discourse, 
as Fo3S [1982] has noted). The notion of an automatic spread of activation 
refers to a specific linkage between particular representations of particular 
words ( see Collins & Lof tus f 1 975 ) — ( direct) stimulation of one lexical 
representation leads mechanically and inevitably to the (indirect) stimulation 
of other lexical representations. The relation of prepositions to nouns, how- 
ever, is net sensibly portrayed as linkages among particular internal 
representations of complete words. (What would rationalize the linkage of 
above and elephant ? ) If there are linkages one might expect them to be de- 
fined over the small set of prepositions and the small set of morphemes that 
comprise the inflected endings of nouns. By such an account, prepositions 
would not be linked to the very many noun systems but to the few sets of 
inflected endings that the very many noun systems share. The problem with 
this account Is that the inflected endings of (Serbo-Croatian) nouns do not 
appear to be stored as sets separately from their stems. 

The present experiments extend the inquiry into Serbo-Croatian nouns and 
their processing relation to other word types. Here the focus is the relation 
of nouns to adjectives. Two related questions are raised. First, can 
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adjectives affect the time to lexically evaluate nouns with which they are 
grammatically consistent? And second, if adjectives can affect lexical deci- 
sions on nouns do they do so at the morphological level, that is, the level of 
stems and affixes (rather than, say, the whole word level)? Support for the 
view that adjectival influences on nouns can be mediated by processes at the 
level of inflected endings would be provided by the demonstration that both 
adjective contexts and pseudoadjective contexts (letter strings derived from 
adjectives by changing the initial or middle consonant) expedite lexical deci- 
sions on noun targets when the inflection of the contextual item and the tar- 
get are in grammatical agreement. 

The selection of nouns used in the experiments was guided by the follow- 
ing considerations. With a few exceptions Serbo-Croatian nouns fall into 
three declensional classes according to the inflected ending of the genitive 
singular case. These three classes are designated (after Bidwell, 1970) as 
Class A (where the genitive singular ending in /e/, for example, ZENE), Class 
0 (where the genitive singular ending is /a/, for example, COVEKA), and Class 
C (where the genitive singular ending is /i/, for example, STVARI). The 
dominating gender for Class A nouns is feminine. The nouns in Class C are al- 
most exclusively feminine but Class C occurs less frequently than Class A. 
Class 0 noun^ are mostly masculine and neuter nouns. From a consideration of 
nouns in the ordinary, written language, Kosti6 (1965) reported that the 
masculine gender accounts for 52 percent, the feminine gender for 36 percent 
and the neuter gender for 12 percent. Consequently, the nouns in the corpus 
of words from which the stimuli of the present experiment were drawn occurred 
in the three genders in approximately the proportions identified by Kosti6, 
with the masculine and neuter nouns drawn from the declension Class 0 and the 
feminine nouns drawn from the declension Class A. 

The adjectives in the corpus of words from which the stimuli were drawn 
were common adjectives all declined as indefinite adjectives. Common 
adjectives are those that can be declined both definitely and indefinitely. 
The indefinite declension of an adjective applies when tne function is either 
predication or attribution. In the latter role the indefinite adjective is 
not accompanied by a deictic such as "this," "that," etc., and is referential- 
ly vague. Definite adjectives are restricted to the attributive function and 
are always conjuncted with a deictic. When an adjective qualifies an inani- 
mate noun in the masculine gender, the indefinite and definite declensions are 
distinguished by the inflected endings of the nominative singular and 
accusative singular. There are, however, no such written distinctions for the 
definite and indefinite adjectival declensions when the word being qualified 
is an inanimate noun in the feminine gender (although such distinctions can be 
found in the spoken language in the form of stress variations). The choice of 
the referentially less precise indefinite declension was motivated, in part, 
by the desire to keep to a minimum the semantir relation between the adjecti- 
val and nominal forms paired in the experiments. 
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The first experiment was directed at the effect of grammatical consisten- 
cy between adjectives (real and pseudo) and nouns in the nominative singular 
and genitive singular cases. These two cases are the most frequently occur- 
ring noun cases—the nominative singular accounting for approximately 25 per- 
cent, and the genitive singular accounting for approximately 20 percent, of 
all instances of the noun (Kosti6, 1965; Lukatela et al., 1980). The inflec- 
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tions of these two cases for adjectives and nouns of all three genders are 
shown in Table 1 . Only for the feminine gender are the adjectival and nominal 
inflections identical. 



Table 1 

Nominative Singular and Genitive Singular Inflections of Serbo-Croatian 
Adjectives and Nouns as a Function of Gender 

MASCULINE FEMININE NEUTER 

ADJECTIVE NOUN ADJECTIVE NOUN ADJECTIVE NOUN 

NOMINATIVE 6 6 A A dEorO 

SINGULAR 

GENITIVE OG A E E OG A 

SINGULAR 

6 - null morpheme 



There is some reason to believe that the effect of a preceding grammati- 
cally consistent adjective on lexical decision will not be of the same magni- 
tude for nouns in the nominative singular case and nouns in the genitive sin- 
gular case. As noted above, the nominative singular of a noun is qualitative- 
ly distinguished from the obliqp jases of a noun and appears to play a pivo- 
tal role in the organization o* a noun f s case system (Lukatela et al., 1980). 
Moreover, the nominative singular is less dependent on grammatical factors for 
its interpretation than are the oblique cases (see Lukatela et al. f 1983)* It 
was expected, therefore, that for nouns in the genitive singular lexical deci- 
sion would be fastest when the prime was grammatically consistent but for 
nouns in the nominative singular lexical decision times would be less partial 
to the grammatical consistency of prime and target. 

Method 

Subjects . Fifty-s.tx undergraduate students from the Department of 
Psychology at the University of Belgrade participated in the experiment. All 
subjects had previously participated in reaction time experiments. 

Materials . A list of 150 adjective-noun pairs was constructed with all 
adjectives and nouns (1) drawn from the mid-frequency range of the Kosti6 
table, (2) in the nominative singular form and (3) comprising pairs that were 
congruent in gender. This list v as presented to 70 students (from the Depart- 
ment of Linguistics) who Judged the associative strength of each pair — that 
is, the degree to which the adjective and the noun in a pair were related. 
The twenty-eight adjective-noun pairs that were Judged to be most weakly 
associated were used to generate four groups of 28 word -word pairs: 
nominative singular-nominative singular pairs, nominative singular-genitive 
singular pairs, genitive singular-nominative singular pairs and genitive sin- 
gular-genitive singular pairs. (In each of the foregoing pair types, the 
first case is that of the adjective and the second case is that of the noun.) 
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A different set of adjective-noun pairs, drawn from the original list of 
150 pairs, was used to generate four corresponding groups of 28 
pseudoadjective-pseudonoun pairs (by changing either the initial or middle 
letter of both the adjective and the noun). Another set of 28 pairs from the 
original list of 150 pairs was used to generate four corresponding groups of 
28 pse.'doadjective-noun pairs (by changing either the initial or middle letter 
of the adjective). Finally, one further, different set of 28 pairs was trans- 
formed into four corresponding groups of 28 adjective-pseudonoun pairs (by 
changing either the initial or middle letter of the noun). Throughout the 
generation of these different groups — that paired pseudowords or paired a 
pseudoword with a word — the pseudoword version of a noun or adjective in 
nominative singular or genitive singular preserved the case ending. 

The adjectives and pseudoadjectives were presented as Roman letter 
strings (IBM Gothic) arranged horizontally in the upper half of 35 mm slides. 
In contrast, nouns and pseudonouns were arranged horizontally in the lower 
half of 35 mm slides. The "adjective" slides and the "noun" slides were 
grouped into pairs as determined above to yield a total of 448 pairs of slides 
(28x4x4) of which a given subject saw 112 pairs. 

Design . The major constraint on the design of the experiment was that a 
given subject never encountered a given word or pseudoword in any of the pairs 
more than once. This was achieved by dividing subjects into four groups with 
14 subjects in each group and by dividing each set of 28 pairs into four 
subgroups of 7 pairs. In sum, a subject saw 7 pairs of stimuli from eac of 
the 16 groups of pairs. Put differently, each subject saw the same adjectives 
and nouns as every other subject but not necessarily in the same grammatical 
case nor necessarily in the same type of nominative-genitive permutation. 

Procedure . On each trial, two slides were presented. The subjects task 
was to decide as rapidly as possible whether the letter string contained in a 
slide was a word. Each slide was exposed in one channel of a three-channel 
tachistoscope (Scientific Prototype, Model GB) illuminated at 10.3 cd/m2. 
Both hands were used in responding to the stimuli. Both thumbs were placed on 
a telegraph key button close to the subject and both forefingers on another 
telegraph key button two inches further away. The closer button was depressed 
for a "No" response (the string of letters was not a word); and the further 
button was depressed for a "Yes" response (the string of letters was a word). 

Latency was measured from the onset of a slide. The subjects response 
to the first slide terminated its duration and initiated the second slide un- 
less the latency exceeded 1300 ms, in which case the second slide was initiat- 
ed automatically. The duration of the second slide, unlike that of the first, 
was fixed at 1300 ms. 

Results and Discussion 

A mean reaction time was computed for each subject by averaging over the 
seven nouns or seven pseudonouns in each group of prime-target pairs. Reac- 
tion times less than 300 ms and longer than 1 300 ms were exluded as were the 
times associated with erroneous responses. The total number of responses 
excluded by the preceding criteria did not exceed 1.5 percent. 
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Table 2 

Lexical Decision and Percentage Error for Pseudonouns in Experiment 1 as a 
Function of Type and Grammatical Case of Adjectival Prime 



Type of prime 



Grammatical case 
of prime 



Grammatical case 

of target pseudonoun 

NOMINATIVE GENITIVE 



ADJECTIVE 



PSEUDOADJECTIVE 



NOMINATIVE 


822 a 


8 H5 




3.3 b 


5.1 


GENITIVE 


831 


833 




3.3 


2.8 


NOMINATIVE 


821 


822 




3.1 


3.3 


GENITIVE 


824 


833 




3.1 


2.6 



reaction time (ms) 
error 



Table 3 

Lexical Decision Latencies and Percentage Error for Nouns In Experiment 1 as a 
Function of Type and Grammatical Case of Adjectival Prime 



Type of prime 



ADJECTIVE 



Grammatical case 
of prime 



NOMINATIVE 



GENITIVE 



Grammatical case 
of target noun 



NOMINATIVE 

727 a 
4.3 & 

720 
4.6 



GENITIVE 

781 
4.8 

714 
4.8 



PSEUDOADJECTIVE 



reaction time (ms) 
error 



NOMINATIVE 



GENITIVE 



713 
7 3 

694 
4.3 



795 
6.6 

773 
2.8 
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Table 2 reports the pseudonoun data. As can be seen, there were no 
differences due to the type of prime, the grammatical case of the prime, or 
the grammatical case — inflected ending — of the pseudonoun. The mean reaction 
times to the primes themselves were 706 ms and 726 ms, respectively, for 
adjectives in the nominative singular and genitive singular forms, and 8 Ml ms 
and 870 ms, respectively, for pseudoadjectives inflected in the fashion of the 
nominative singular and genitive singular. Table 3 reports the noun data. 
The only effects that were significant according to the analysis of variance 
on both subject and item means were: grammatical case of the adjectival prime 
(F(1,52) - 24.31, MSe - 2082, £ < .001 and F(1,27) - 4.46, MSe - 5676, £ < 
.05) and grammatical case of the noun target 1f(1 ,52) = 145.26, MSe = 2733, £ 
< .001 and F(1 f 27) - 26.36, MSe - 7532, £ < .00T). 

The failure to observe a significant priming effect by either adjectives 
or pseudoadjectives might have been expected. Approximately half of the words 
used in the experiment were feminine. The genitive singular form of feminine 
nouns (and adjectives) are identical to the nominative plural form of feminine 
nouns (and adjectives) (see Table 1). As noted, the nominative singular case 
of nouns has not proven to be sensitive to priming. If the nominative plural 
is similarly indifferent to priming and if the feminine "genitive singular" 
noun forms of the present experiment were interpreted as nominative plural 
forms, then the adjectival and 3eudoadjectival priming of nouns would be 
thwarted. Table 4 distinguish^ the mean decision tiroes for the mascu- 
line/neuter items from those for the feminine items. Inspection of Table 4 
suggests that (1) adjectival and pseudoadjectival effects were present for the 
masculine/neuter genitive singular forms (corroborated by a subject analysis, 
F(1 ,52) - 6.22, MSe - 21961, £ < .02, but not by an item analysis) and absent 
for the feminine genitive singular forms (the prime case by target case 
interaction was not significant by either subjects or items analysis); and (2) 



Table 4 

Lexical Decision Latencies of Experiment 1 as a Function of Noun Gender 



Type of prime 



ADJECTIVE 



PSEUDOADJECTIVE 







gender and 


case of target 


noun 


grammatical case 
of prime 


masculine/neuter 
NOMINATIVE GENITIVE 


feminine 
NOMINATIVE 


GENITIVE 


NOMINATIVE 


721 


810 


739 


71 9 


GENITIVE 


696 


716 


731 


727 


NOMINATIVE 


708 


833 


719 


731 


GENITIVE 


697 


803 


704 


728 



the commonly obtained (e.g., Lukatela et al., 1978; Lukatela et al., 1980; 
Lukatela et al., 1983) faster decision times for nominative singular forms 
relative to oblique forms was not found with the feminine noun data, implying 
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that the feminine nouns in the "genitive singular" were not being interpreted 
as such. It should be noted that a similar but less pronounced confounding of 
cases is also true for the neuter genitive singular (which is written identi- 
cally to the nominative plural and genitive plural). However, whereas for the 
feminine gender both nouns and adjectives assume identical forms in the 
genitive singular and nominative plural, for the neuter gender identity of 
forms holds only for nouns. 

Experiment 2 

The second experiment used the same design, the same procedure, and the 
same adjective-nouns pairs as those of the first experiment but replaced the 
genitive singular case by the dative-locative singular case and with a new 
group of 56 subjects from the same subject pool. In the declension of 
adjectives and nouns the dative singular and the locative singular are identi- 
cal in each of the three genders. The characteristic inflections common to 
dative singular and locative singular are shown in Table 5. With respect to 
the noun case eonfoundings identified above, the dative singular-locative sin- 
gular inflection across the three genders is not shared with the nominative 
plural and, in fact, is shared with no other case. Thus, in comparison to 
Experiment 1 , grammatical priming between feminine gender words should be ob- 
served in Experiment 2 if, indeed, the failure to obtain such priming in 
Experiment 1 was due to case confounding. 



Table 5 

Inflections of Dative Singular and Locative Singular Adjectives and Nouns as a 
Function of Gender 

Masculine Feminine Neuter 

ADJECTIVE OM OJ OM 

NOUN U I U 



Results and Discussion 

The mean lexical decison latencies were computed in the manner described 
in Experiment 1. The positive and negative responses to the adjectival primes 
were similar in pattern to those reported for Experiment 1. Negative re- 
sponses to the pseudonoun targets are given in Table 6. No main effects or 
interactions were significant. Table 7 reports the noun data for all three 
genders taken together. The analysis of variance on subject means and item 
means (reported in parentheses) revealed significant effects for the grammati- 
cal case of the adjective prime, F(1 ,52) « 40.59, MSe « 1372, jg < .001 
(F(1 ,27) - 8.05, MSe - 3460, jg < .01); for the grammatical case of the noun 
target, F(1 ,52) - 61 .27, MSe - 2508, p < .001 (F(1,27) - 22.98, MSe - 3343, £ 
< .001), for the type of adjectival prime, F(1 ,52) - 7.88, MSe - 4104, j> < * 01 
(F(1 ,27) - 8.85, MSe - 1827, J> < * 01 a " d for the lnteractlon between the 
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Table 6 

Lexical Decision Latencies and Percentage Error for Pseu<ionouns in Experiment 
2 as a Function of Type and Grammatical Case of Adjectival Prime 



Type of prime 



Grammatical case 
of prime 



Grammatical case 
of target noun 



ADJECTIVE 



NOMINATIVE 



DATIVE/LOCATIVE 



NOMINATIVE 

758* 
H.6 b 

757 
3.6 



DATIVE/LOCATIVE 

758 
5.6 

774 
2.0 



PSEUDOADJECTIVE 



NOMINATIVE 



DATIVE/LOCATIVE 



763 
1.8 

750 
3.3 



767 
H.8 

760 
4.8 



reaction time (ms) 



error 



Table 7 

Lexical Decision Latencies and Percentage Error for Nouns in Experiment 2 as a 
Function of Type and Grammatical Case of Adjectival Prime 



Type of prime 



Grammatical case 
of prime 



Grammatical case 
of target noun 



ADJECTIVE 



NOMINATIVE 



DATIVE/LOCATIVE 



NOMINATIVE 

672 a 
3.3 b 

668 
2.6 



DATIVE/LOCATIVE 

726 
3.1 

685 
H.6 



PSEUDOADJECTIVE 



NOMINATIVE 



DATIVE/LOCATIVE 



656 
3.8 

647 
0.8 



708 
3-3 

673 
3-3 



reaction time (ms) 



error 
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grammatical case of the adjectival prime and the grammatical case of the noun 
target, F(1 ,52) * 16.10, MSe » 1841, j> < .001 (F(1,27) = 4.84, MSe - 3060, £ < 
.05). All other two-way and three-way interactions were nonsignificant. The 
significance of the type of adjectival prime may be attributed to the differ- 
ence between responding positively to two successive stimuli (in the adjective 
trials) and responding negatively to the first stimulus and positively to the 
second stimulus ( in the pseudoadjective trials). Intuitively, this interpre- 
tation suggests slower decision times for targets followi/.g pseuJoadjectives. 
Inspection of Table 7 (and of Table 3) shows, to the contrary, that 
pseudoadjective primes were associated with overall faster decisions. One is 
tempted to say that the effect of word primes is predominantly "inhibitory." 

Table 8 reports the mean lexical decision times for the nouns partitioned 
according to the masculine/neuter gender and feminine gender categories. 
Inspection of Table 8 and comparisons with the pattern of results in Table 4 
suggest that grammatical priming occurred in bolh categories in the second 
experiment in contrast to the first and lends credence to the interpretation 
given of the feminine gender data of the first experiment. 



Table 8 

Lexical Decision Latencies of Experiment 2 as a Function of Noun Gender 

gender and case of target noun 

Type of prime 



ADJECTIVE 



PSEUDOADJECTIVE 



grammatical 


case masculine/neuter 


feminine 


of prime 


NOMINATIVE 


DATIVE/ 


NOMINATIVE 


DATIVE/ 






LOCATIVE 




LOCATIVE 


NOMINATIVE 


663 


71 7 


682 


741 


DATIVE 


652 


672 


680 


685 


NOMINATIVE 


651 


708 


662 


708 


DATIVE 


643 


669 


649 


682 



Discussion 

The theoretically important descriptors "facilitation" and "inhibition" 
are not applicable to the data of Experiments 1 and 2. In neither experiment 
is there a neutral context to provide a baseline. The results are more 
prudently summarized in terms of an inequality and an equality: 

(1) The lexical decision time for a noun in a grammatically congruent 
adjective or pseudoadjective context is less than the lexical decision 
time for a noun in a grammatically incongruent adjective or 
psr .doadjective context; and 

(2) The lexical decision time for a pseudonoun in a grammatically congruent 
adjective or pseudoadjective context is equal to the lexical decision 
time for a pseudonoun in a grammatically incongruent adjective or 
pseudoadjective context. 
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An adjective or pseudoadjecv^ve defines a minimal grammatical context 
(cf. Kroll & Suhwieckert, 1978) for a target noun. In terms of a distinction 
suggested by Seidenberg, Tannehaus, Leiman, and Bienkowski (1982), this mini- 
mal grammatical context is "nonpriming, n meaning ttev it contains no lexical 
items that are semantic relatives or associates of the target item. By argu- 
ment, a nonpriming context cannot have a selective influence on the ley icon; a 
selective influence is 30lely a consequence of intralexical processing. It is 
suggested that intralexical processing reflects the interconnections of 
entities in semantic memory but it does not reflect grammatical structure and 
pragmatic knowledge (Forster, 1979). The context : .hat gives rise to intralex- 
ical processing — one that contains items associatively an<i/or semanticaUy 
related to the target — is termed "lexical pricing" by S-'denberg et 
al. (1982). In the introduction and elsewhere (Lukatela, Mor. ia, Stcjnov, 
Savi<5, ati;, & Turvey, 1982) it has been argued that the 3ff * on lexical 
decision of minimal grammatical contexts (e.g., a prepositic: >r a noun, a 
pronoun for a verb) does not lend itself to the notion of process inj based up- 
on interconnections among individual lexical representations. Consequently, 
as Lukatela et al. (1982) remark "...semantic faciliation *nd grammatical 
facilitation are probably best understood not as expressions of a single mech- 
anism but rather as an expression of different mechanisms that stand in a com- 
plementary relation.,.." (p. 299) 

The sentiment of the preceding quotation is given expression in the lan- 
guage-processing system proposed by Forster (1979). Forster 1 s system is com- 
posed of three sub-systems: (1) a lexicax processor that accesses the 
representations in the lexicon of the target word and the context words (cr 
word); (2) a syntactic processor that assigns a syntactic structure to the 
sentence constituted by the target word and its context; and (3) a message 
processor that a?' ^s meaning to the syntactic structure. All three subsys- 
tems feed into a nanism that, in the context of experiment3, functions sim- 
ply as a dec is iv -maker (e.g., is it a word?). Differences in positive lexi- 
cal decision times for target items associated with different contexts may 
originate in the decisioi making process, that is, post lexically (West & 
Stanovich, 1982). Consicer a grammatically congruent adjective-noun pair in 
the present experiments. The output from the lexical processor and the output 
from the syntactx; processor will both be positive. Because of the weak 
association between the words in the present experiments the output from the 
message processor might be negative or arise too slowly to contribute to the 
decision making (cf. de Groot, Thomassen, & Hudson, 1982). In contrast, for a 
grammatically incongruent adjective-noun pair the output from the lexical 
processor will be positive buc the output from the syntactic processor will be 
negative. In order for *he decision-making mechanism to arrive at an appro- 
priate response in the situation of an incongruent adjective-noun pair it must 
overcome the bias toward a no decision engendered by the syntactic processor. 
Overcoming this bias will take time and consequently the lexical decision la- 
tency will be slowed relative to the situation in which the adjective and noun 
are in grammatical agreement. 

A similar account can be given of the differences between grammatically 
congruent and grammatically incongruent pseudoadjective-noun pairs. Here, 
however, it must be assumed that the syntact^ processor responds positively 
when there is an agreement of inflection despite the fact that the contextual 
item is nonsense. Thus, for the lexical decision on the second member of a 
grammatically congrueit pseudoadjeclive-noun pair, the lexical processor and 
the syntactic processor will both feed positively to the decision maker— only 
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the message processor's output will be negative. This is in contrast to the 
situation in which the inflection of the pseudoadjective and noun do not agree 
grammatically, for in this situation only the lexical processor's output will 
be positive. Consec uently , the derision making will have to overcome more 
negative biasing ana be slowed proportionately greater relative *o the situa- 
tion in which the pseudoadjective and noun are grammatically suited. For 
pseudoadjective-noun pairs lexical decision is faster when the inflections 
agree than when they do not agree. 

Arguing from the perspective of Forster'* (197°) language-processing sys- 
tem, it might be expected that tho rejection of pseudonouns should be retarded 
by grammatical consistency. The negative outputs from the lexical processor 
and message processor will contrast with the positive output from the syntac- 
tic processor when the pseudonoun target and its context are in grammatical 
agreement. To arrive at the appropriate no response the decision maker will 
have to resolve the inconsistency of outputs and the bias to respond yes . In 
two previous experiments examining the effects of minimal grammatical contexts 
on lexical decision it was observed that pseudonouns were rejected more slowly 
when the preceding item was a grammatically congruent preposition (Lukatela et 
al., 1983) and pseudoverbs were rejected more slowly when the preceding item 
was a grammatically congruent personal pronoun (Lukatela et al. , 1982). In 
the present experiments, however, there is no statistically significant evi- 
dence for the slowing of negative decisions by grammatical agreement. 

To account for the indifference of rejection responses to grammatical 
congruency requires making explicit a process that is implicit in the above 
account of acceptance responses, namely, suffix stripping. According to the 
view of Taft and Forster (1975), perceiving an inflected adjective or noun in- 
volves decomposing the item into its stem and suffix (see also Taft, 1981 ; 
Stanners et al. , 1979). In performing lexical decision, the representation of 
the stem morpheme is accessed by the lexical processor and the appropriateness 
of the inflected ending is determined on the basis of the information stored 
with the stem f s representation. A similiar decomposition must occur for 
pseudoadjectives and pseudonouns except that for these items there would be no 
specific representation of the stem morpheme to be accessed, only close 
approximations. 

It might be supposed that where the lexical processor focusses on the 
word stem, the syntactic processor focusses on the bearers of grammatical 
information, i.e., roughly, the suffixes * open-class words and the free 
morphemes of closed-class words. Whatever the bearers in any given con- 
text-target situation, assessing a grammatical fit takes time. Indeed, the 
difference between the present results and previous results with regard to 
negative responses might sugg^3t that discovering the grammatical consistency 
in an adjective-pseudonoun or pseudoadjective-pseuuo^oun pair is slower than 
discovering the grammatical consistency in, say, a preposition-pseudonoun 
pair. The idea is that the longer the time taken by the syntactic processor 
to arrive at an output the less the likelihood that the activity of the 
syntactic processor will influence the time course of the lexical decision; an 
internal.lv defined deadline on response selection must be assumed. For the 
preceding suggestion to be realizable it might have to be the case that (1 ) 
the grammatical link between closed-class, function words (e.g., prepositions, 
pronouns) and open-class, content words (e.g., nouns, verbs, adjectives) is 
"stronger" and more rapidly assessed than the grammatical link between 
open-class content words (e.g., the link between adjectives and nouns); and 
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(2) the syntactic processor can be influenced by the lexical processor. Re- 
call that in the present experiments, although the lexical decision on a pseu- 
donoun in the context of a pseudoadjective was not affected by grammatical 
consistency, the lexical decision on a noun in the same context was markedly 
affected. In short, the lexical status of the target made a difference — and 
that status is determined by the lexical processor. 

In conclusion, evidence has been presented for the influencing of lexical 
decisions about inflected nouns by weakly associated inflected adjectives that 
are grammatically consistent or inconsistent with their target nouns. This 
effect seems to be mediated by a process that evaluates the grammar of a noun 
and its adjectival context primarily on the basis of the inflected morphemes. 
Although this effect demonstrated in "nonpriming contexts" (Seider.berg et al. 
1982) can be referred to as grammatical priming (Lukatela et al., 1982; 
Lukateia et al. , 1983) it appears to be a postlexical effect related to, but 
distinct from, the priming mechanisms of automatic spreading activation and 
context-induced actentional processing (Neely, 1977; Posner & Snyder, 1975) 
that nave been identified in "lexical priming" contexts (Seidenberg et al., 
1982). Lukatela et al. (1982) concluded that the grammatical priming of 
inflected verbs by pronouns and vice versa was automatic. Their conclusion 
was based in part on the observation that pronominal facilitation of verbs was 
virtually complete when the onsets of context and target were separated by on- 
ly 300 ms. They recognized, however, that this automaticity did not refer to 
spreading activation. It is supposed that the present example of grammatical 
priming is also automatic but the kind of automaticity being referred to is 
closer to that suggested by de Groot et al.'s (1982) notion of an automatic 
checking for coherence (see also West & Stanovich, 1982) than it is to the 
more familiar notion of an automatic spreading of influences among connected 
representations in the internal lexicon. 
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DEAF SIGNERS AND SERIAL RECALL IN THE VISUAL MODALITY: MEMORY FOR SIGNS, 
FINGFRSPELLING, AND PRINT* 



Hena Arena Krakowt and Vicki L. Hanson 



Abstract , This study investigated serial recall by congenitally f 
profoundly deaf signers for visually specified linguistic informa- 
tion presented in their primary language , American Sign Language 
(ASL), and in printed or f ingerspelled English. There were three 
main findings. First, differences in the serial-position curves 
across these conditions distinguished the changing-state stimuli 
from the static stimuli, ihese differences were a recency advantage 
and primacy disadvantage for the ASL signs and f ingerspelled English 
words, relative to the printed English words. Second, the deaf sub- 
jects, who were .college students and graduates, used a sign-based 
code to recall ASL signs, but not to recall English words; this re- 
sult suggests that well-educated deaf signers do not translate into 
their primary language when the information to be recalled is in En- 
glish. Finally, mean recall of the deaf subjects for ordered lists 
of ASL signs and f ingerspelled and printed English words was signif- 
icantly less than that of hearing control subjects for the printed 
words; this difference may be explained by the particular efficacy 
of a speech-based code used by hearing individuals for retention of 
ordered linguistic information and by the relatively limited speech 
experience of congenitally, profoundly deaf individuals. 

Hearing individuals have been shown to use a speech-based code in the 
short-term recall of linguistic information, whether spoken or printed (Con- 
rad, 1964; Wickelgren, 1965). Their recall performance is similar in the two 
cases except for a recency advantage favoring spoken over printed items in the 
last serial positions (Corballis, 1966; Murray, 1966). Because the orthogra- 
phy of English is a secondary representation derived from the primary or basic 
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spoken language (Mattingly, 1972), it is not surprising that orthographic 
representations are recoded into a speech-based code. In addition, a 
speech-based code may be especially useful when the memory task calls for re- 
call of ordered information (Baddeley, 1979; Crowder, 1978; Hanson, 1982; Hea- 
ly, 1 975). 

The relations among prii anguage , coding strategy, and recall per- 

formance become more difficult *o unravel when we consider bilingual deaf in- 
dividuals who use American Sign Language (ASL) as a primary language and En- 
glish as a secondary language. The term "primary language 11 refers to a natur- 
al language in the form in which it functions as a principal means of communi- 
cation anions members of a speech community. Writing systems and other invent- 
ed representations that are based upon natural languages are viewed as 
nonprimary derived systems. 

ASL is the primary visual-gestural language of the deaf community in the 
United States and Canada, and is acquired as a native language by children of 
deaf parents. Structural differences between signed and spoken languages re- 
flect differences between auditory-vocal and visual-gestural channels of 
communication. For example, spoken languages are characterized by sequential 
forms of structuring at the abstract phonological and morphological levels. 
Words are composed of sequentially arranged phonemes, and morphological pro- 
cesses typically add one or more prefixes and/or suffixes (each composed of 
one or a series of phonemes) to a stem. In contrast, ASL is strikingly dif- 
ferent from spoken languages in the extent to which it utilizes simultaneously 
structured units in lexical and morphological composition (Bellugi, 1980; Kli- 
ma & Bellugi, 1979). Signs, the lexical items of ASL, are composed of several 
co-occurring formational parameters (Stokoe, Casterline, & Croneberg, 1965), 
and morphological relations are expressed by spatial and temporal modifica- 
tions of the basic form of a sign (Bellugi, 1980). 1 

Those who use ASL as a primary means of communication also use 
f ingerspelling for concepts lacking a sign. Fingerspelling is a manual form 
of English orthography that assigns a unique hand configuration to every let- 
ter of the English alphabet; as such, it is a changing-state representation of 
the graphic form of a spoken language. Fingerspelling is not used as a pri- 
mary means of communication by members of the deaf community (Battison, 1978). 
Although f ingerspelled words may often occur within signed sentences, this 
letter-by-letter sequential representation of English words differs consider- 
ably from the co-occuring formational parameters of ASL signs. 

No writing system in use is based upon ASL, and educated deaf American 
signers read and write in English. But the use of ASL and of written or f in- 
gerspelled English by deaf bilinguals is quite different from the use of two 
spoken languages by hearing bilinguals. For a deaf person, learning the 
orthography (whether through writing or fingerspelling) of English means 
learning an orthographic visual system derived from a primary form to which he 
or she does not have normal access. In contrast, hearing bilinguals do have 
normal access to the primary forms of both languages that they use. Moreover, 
the significant structural differences between ASL and English at both the 
lexical and grammatical levels require the ASL-English bilingual to know two 
radically different forms of linguistic structuring. The bilingual who uses 
two spoken languages is required to know one form of linguistic structuring, 
that characterizing spoken languages. 
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The present research examined serial-order recall by deaf signers and ad- 
dressed the question of how coding strategies and recall performance are 
affected by the requirement to remember ASL in contrast to Englisn. Differ- 
ences in performance that may stem from the presentation of English words by 
f ingerspelllng and print were also examined. The hypotheses underlying this 
work are discussed in the following sections on serial-position effects, cod- 
ing, and accuracy of recall. 

Serial-Position Effects 

Although hearing subjects use a speech-based code for recall of both spo- 
ken and printed word lists, auditory presentation results in a recency advan- 
tage over visual presentation (for a review of this research, see Penney, 
1975). This advantage for the more recently presented items occurs whether 
the experimenter or the subject reads the stimuli aloud. On the basis of such 
findings, the critical variable appears to be hearing the items. The "modali- 
ty effect 11 was originally attributed to the fact that information in 
pre-categorical acoustic storage (PAS) has greater durability than information 
in an iconic sensory representation (Crowder & Morton, 1969). 

However, further research provided evidence for similar effects in the 
visual modality in the absence of acoustic information, thus casting doubt on 
the PAS explanation for the recency advantage. Findings of recency advantages 
for ASL signs (Shand, 1980), moving hand shapes (Campbell, Dodd, & Brasher, 
1983), lipread items (Campbell & Dodd, 1980; cf. Crowder, 1983), mouthed 
items (Nairne & Walters, 1983), and items vocalized "aloud" by deaf subjects 
(Engle, Spraggins, & Rush, 1982) are all incompatible with an explanation 
based on acoustic advantage. 

Two alternative accounts to the PAS explanation have been proposed. 
First, the difference in recency favoring, for example, spoken, lipread, and 
signed information over orthographic information may reflect an advantage in 
recall of primary-language input over nonprimary (printed) input (Campbell & 
Dodd, 1980; Campbell et al., 1983; Nairne & Walters, 1983; Shand, 1980; Shand 
& Klima, 1981 ). Seoond, this effect may be attributed to an advantage in 
remembering changing-state information over remembering static information 
(Campbell & Dodd, 1980; Campbell et al., 1983; Nairne & Walters, 1983). Here- 
after, the term "dynamic" will be used to mean "changing-state, " 

It is important to note that recall differences between lists of words 
that are heard and lists that are silently read are restricted to the recency 
portion of the curve, with a recency advantage for tfce words that are heard. 
Thus, there is an overall advantage for the heard lists. However, the recency 
advantage for lipread and for mouthed lists does not yield an overall advan- 
tage over printed (silently read) lists. This is because recall of lipread 
and mouthed lists is poorer than recall of printed lists at earlier serial 
positions. Researchers have tended to focus on the similarity in recency ef- 
fects among mouthed, lipread, and spoken input conditions, without giving ade- 
quate attention to the fact that spoken input results in the best recall over- 
all. The dynamic-presentation hypothesis and the primary-language hypothesis 
must therefore be examined with respect to effects that span the entire seri- 
al-position curve. 

The present study was designed to separate serial position effects 
attributable to primary language from those attributable to dynamic presenta- 
tion. Serial position functions that distinguished f ingerspelled and printed 
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English lists from lists of ASL signs would provide support for the pri- 
mary-language hypothesis. On the other hand, seri 1 position functions that 
distinguished the signed and fingerspelled lists from the printed lists would 
provide support for the dynamic-presentation hypothesis. 

C odin g 

Research with deaf signers can also provide insight into the question of 
whether a code based on one's primary language is useful when the recall task 
involves information whose linguistic structure is quite different from that 
of the primary language. Shand (1982; Shand & Klima, 1981) suggested that the 
primary code is the natural and most efficient code for short-term recall of 
linguistic information. Recoding by hearing individuals from print into a 
speech-based code takes advantage of the systematic relation between the spo- 
ken form and its orthography (Mattingly, 1972). However, there is no such 
systematic relation between ASL signs and English orthography. 

Simultaneously occurring parameters of movement, place of articulation 
within the signing space, and hand configuration are the sub lexical components 
of ASL signs (Stokoe e al., 1965). These format! onal parameters (cheremes or 
primes) evidently support recall of signs by deaf signers much as phonetic 
parameters of speech support recall of spoken information by hearing individu- 
als (Bellugi, Klima, & Siple, 1975; Hanson, 1982; Poizner, Bellugi, & Tweney, 
1981; Shand, 1982). Thus, Bellugi et al. (1975) found intrusion errors sug- 
gesting sign-based coding of ASL signs by deaf signers on a serial-recall 
task. The majority of the intrusion errors were signs that differed from a 
correct response by one formational parameter. For example, some of the sub- 
jects reported JEALOUS for CANDY. The signs for JEALOUS and CANDY are a mini- 
mal pair in that they have the same place of articulation and movement; they 
differ only in hand configuration. Likewise, some subjects reported NEWSPAPER 
for BIRD; these two signs share movement and hand configuration and differ on- 
ly in place of articulation. 

Evidence for both sign-based and speech-based recoding of printed words 
by deaf subjects has been obtained ir serial-order recall tasks (Hanson, 1982; 
Lichtenstein, in press; Shand, 1982). Subject characteristics associated with 
coding preferences suggest that speech-based recoding is typically used by 
those prelingually , profoundly deaf adults who are better readers and who have 
better speech production skills (Lichtenstein, in press). A shortcoming of 
previous studies was that they compared the performance of different groups of 
subjects on the different stimulus types. Furthermore, they never included 
fingerspelled English. Presenting ASL signs, printed English words and fin- 
gerspelled English words to the same group of deaf signers in the present 
study made it possible to ascertain whether deaf individuals changed strate- 
gies as the stimuli changod or maintained a preferred strategy, such as 
sign-based or speech-based coding. In order to provide English words that 
were compatible with a sign-based code, half of the fingerspelled and printed 
words were chosen because they had readily available sign translations 
( "high-signability" words); the other half, because they did not ( "low-signa- 
bility" words). If deaf subjects recode into signs and recoding into one's 
primary language is the most natural and efficient strategy (Shand, 1982), 
then two outcomes might be predicted. First, high-signability words should be 
recalled more accurately than low-signab.'lity words. Second, recall perform- 
ance on high-signability words should provide evidence of sign-intrusion er- 
rors. 
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Accuracy of Recall 

In general, when congenitally , profoundly deaf individuals perform a task 
that calls fo» ordered recall of English words or letters, they do not perform 
as well as hearing subjects (Belmont & Karchmer, 1978; Belmont, Karchmer, & 
Pilkonis, 1976; Hanson, 1982; MacDougall, 1979; Wallace & Corballis, 1973). 
Belmont and Karchmer argued that the generally poorer performance of deaf in- 
dividuals reflects a "mismatch" between the native language (ASL) and the lan- 
guage of the information to be recalled (English). However, even on seri- 
al-recall tasks involving ASL signs, deaf signers do not remember as many 
items as hearing subjects tested on the signs' printed (Hanson, 1982) or spo- 
ken English equivalents (Bellugi et al., 1975). Moreover, Hanson found that 
deaf subjects did perform as well as hearing subjects on tasks that called for 
free recall of printed Englisn words. The nature of the ordered-recall task, 
rather than characteristics of the input, may actually favor hearing individu- 
als. 

Recent studies indicate that the speech code is particularly useful for 
retaining order information (Baddeley, 1979; Crowder, 1978; Hanson, 1982; Hea- 
iYf 1975). For deaf subjects, accuracy of recall has been found to correlate 
with the use of a speech-based code; those who use this code efficiently re- 
call more than those who use it inefficiently or not at all (Conrad, 1979; 
Han3on, 1 982 ; Lichtenstein , in press). Therefore, it seems that the 
speech-based code may facilitate serial-order recall in a way that alternative 
coding mechanisms, including sign -based coding of ASL signs, do not. Further- 
more, it is likely that the use of the speech-based code by deaf individuals 
is not as effective as it is for hearing people. The present study examined 
the recall performance of deaf subjects, who were highly proficient in English 
as well as in ASL, and asked whether accuracy of recall differs as a function 
of the type of linguistic input (dynamic vs. static; primary vs. nonprimary) 
or whether serial recaT is, regardless of input characteristics, a particu- 
larly difficult task for individuals who do not have normal access to speech. 

Experiment 

This experiment compared the performance of congenitally, profoundly deaf 
signers when presented with English woros and ASL signs for serial-order re- 
call. The presentation mode of the English words was varied so that some were 
printed and others were f ingerspelled. All the deaf subjects used ASL as 
their primary means of communication. The recall performance of two groups of 
deaf subjects was compared in order to find out whether there are performance 
differences between native and nonnative signers. Members of one group ac- 
quired ASL as a nativ * language from deaf parents, and members of the other 
group learned ASL outs.de the home in the early school years. A normal-hear- 
ing c ^trol group was tested on the printed stimuli. 

Method 

Subjects 

All subjects were tested individually and were paid for their participa- 
tion. 

Deaf subjects . Twenty congenitally, profoundly deaf subjects participat- 
ed in the short-term memory experiment; two were eliminated because their 
hearing loss was less than the criterion for profound deafness (85 dB, bet- 
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ter-ear average). Background information gathered from the subjects indicated 
that they all used ASL as their primary means of communication, supplemented 
by fingerspelling. Eight of the subjects were born to deaf parents and had 
acquired ASL as a native language (native signers), while 10 of the subjects 
had hearing parents and learned ASL outside the home in the early school years 
(nonnative signers). All subjects were currently attending or were recent 
graduates of Gallaudet College, a liberal arts college for deaf students. 

Twenty congenitally deaf adults served as control subjects on a perceptu- 
al task, described below. Nine of these subjects had participated in the mem- 
ory experiment several months before. Each had a hearing loss of at least 70 
dB in the better ear. They were all students or graduates of Gallaudet Col- 
lege and reported using ASL as a primary means of communication. 

Hearing Subjects . Ten hearing subjects were recruited from among Yale 
University students and affiliates. They were native speakers of English who 
reported no history of hearing impairment. Because the hearing subjects were 
tested on both sets of printed stimuli, 10 subjects provided sufficient data 
for comparison with the deaf subjects. 



Stimulus lists were constructed from 141 high-signability (HS) English 
nouns and 94 low-signability (LS) English nouns. All were words considered to 
be coumonly known by college-age adults, and were selected with the assistance 
of a deaf native signer. HS words were matched with LS words for frequency of 
occurrence in printed English (KuCera & Francis, 1967). HS words were random- 
ly assigned to each of three presentation conditions: signs, f ingerspelling, 
and print. LS words were randomly assigned to f ingerspelling or print condi- 
tions. These assignments produced one set of stimuli. A second set of stimu- 
li was constructed by reassigning printed items to f ingerspelling or signs, 
reassigning fingerspelled items to signs or print, and reassigning signed 
items to print or f ingerspelling, in order to partially counterbalance the 
assignment of words to presentation conditions. Thus, the following five 
conditions were obtained for both sets of stimuli: (1) American Sign Language 
signs; (2) HS fingerspelled English words; (3) LS fingerspelled English words; 
(4) HS printed English words; and (5) LS printed English words. Each condi- 
tion contained 42 nouns, in seven lists of 6 nouns each. Previous work with 
deaf subjects indicated that a list containing 6 nouns could be expected to 
produce both primacy and recency serial position effects (Bellugi et al., 
1975). An additional 5 lists of 5 nouns provided practice blocks. 



All stimulus lists were videotaped at a rate of 2 sec per trial. A na- 
tive signer recorded the signed and fingerspelled lists on videotape; for max- 
imal visibility, she was framed from forehead to waist. The signer maintained 
a neutral expression throughout the taping session. Printed words were video- 
taped directly from an Atari 400 computer and were displayed for 1.5 sec with 
a .5-sec interstimulus interval. Stimuli in each condition were recorded in 
seven continuous lists of six nouns each. One practice list preceded each of 
the five conditions. 



Stimuli 



Procedure 
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T he order in which stimulus conditions were presented was partially bal- 
anced across subjects as follows: There were five orders of presentation for 
each stimulus set and no condition ever occurred in the same ordinal position 
twice. Four subjects were tested on each of the orders. Order 1 was based on 
differences in mode of presentation: fa) Signs ; (b) HS Flngerspelllng ; (c) Lb 
Flngerspelllng; (d)HS Print ; (e) LS Print . Order 2 was also based on mode 
differences but it involved a rearrangement of the ordering of signs, 
flngerspelllng, and print modes: (a) HS Print ; (b) LS Print ; (c) HS 
Flngerspelllng; (d) LS Flngerspelllng ; (e) Signs . Order 3 arraniid^lists by 
signability differences: (a) LS Print; (b) LS Fingerspelling; (c) HS Print; 
(d) Signs; (e) HS Fingerspelling. Order 4 arranged lists by signability in a 
different ordering than order H: (a) HS Fingerspelling; (b) HS Print; 

(c) Signs; (d) LS Print; (e) LS Fingerspelling. Order 5 mixed modes~and sig- 
nability in a random fashion: (a) LS Flngerspelllng (b) Signs ; (c) LS Print; 

(d) HS Flngerspelllng ; (e) HS Print . — 

Deaf subjects were tested on all five conditions by a native signer who 
provided both printed and signed instructions; nine of the subjects were test- 
ed on one set of stimuli and nine on the other. The subjects were told that 
they would see lists of nouns presented by various modes: ASL signs, printed 
English, and f ingerspelled English. A message printed on the screen indicated 
the termination of each list. The subjects were instructed to watch the 
screen and to write the words they had just seen, in serial order, on the an- 
swer sheet provided. The answer sheet included the numbers 1 through 6 for 
each list with blank spaces for responses. The subjects were not prevented 
from recording words in any order. It was, however, required that wcrds ap- 
pear in their correct serial positions. Bellugi and Siple (1974) reported 
that deaf signers* recall performance with written report of signs was as good 
as their recall perfromance with signed report. 

To control for possible dialectal variations on the interpretations of 
the signs and to ensure a fair scoring procedure, a control group of deaf sub- 
jects was tested in a perceptual task. These subjects were asked to watch the 
signed portions of the videotapes and to simply write down the English trans- 
lation of each sign. 

The hearing subjects were tested by a hearing experimenter who provided 
both printed and spoken directions. Stimuli for the hearing subjects, who 
served as partial controls in this experiment, consisted of the printed condi- 
tions only. Each hearing subject saw both sets of printed stimuli. 

Scoring 

All subjects* responses in the memory task were scored as follows: Items 
were marked correct if they appeared in the proper serial position in U: cur- 
rent list. Dialectal differences were taken into account when scoring the an- 
swer sheets from signed trials; a response on the memory task that matched a 
response in the correct serial position on the perceptual task was scored as 
correct. Because there were seven lists in each condition, seven was the max- 
imum score possible at each serial position for each condition. 
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Results 

A three-way ANOVA examined the wi thin-sub jects effects of presentation 
condition (ASL signs, printed English, f ingerspelled English), and serial 
position (one through six), and the between-subjects effect of group (native 
or nonnative signers) on the number of words the deaf subjects recalled 
accurately. For the purposes of this analysis, performance on high- and 
low-si inability lists was averaged. The analysis revealed a significant main 
effect of serial position, F(5, 80) - 30.01, £ < .0001, and no significant ef- 
fect of either group or condition (both Fs < 1.00). These latter results 
indicated that native and nonnative signers could not be differentiated on the 
basis of their performance on these serial-recall tasks and that their recall 
accuracy was similar for the three presentation conditions. There was, howev- 
er, a significant condition X position interaction, F(10, 160) * 3.33, £ < 
.001, indicating differential effects on the serial-position curve as a func- 
tion of condition. This interaction is shown in Figure 1, in which mean re- 
call is plotted at *ach serial position for the three conditions: ASL signs, 
f ingerspelled English words, and printed English words. In this figure, we 
have pooled the high- and low-signability trials and averaged across the two 
groups of deaf subjects. 
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Figure 1. Mean number of printed, f ingerspelled, and signed items correctly 
recalled by deaf subjects at each serial position. 
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Additional analyses were undertaken in order to understand the nature of 
the interaction. The competing hypotheses regarding the effects of primary 
language vs. those of dynamic presentation prompted examination of the differ- 
ences in serial-position effects as a fun< tion of condition. To test the pri- 
mary language hypothesis , one ANOVA compared performance on the print condi- 
tion to that on the f ingerspelling con*" 1 tion. Thi? was a three-way analysis, 
as above, with the exclusion of the sign condition. In this comparison, the 
condition X position interaction was also highly significant, F(5, 80) =« 6.84, 
£ < .0001. Thus, the serial-position curves for the print and f ingerspelling 
conditions differed. Performance on the signed and f ingerspelled trials was 
compared in the same way. In this ANOVA, the condition X position interaction 
disappeared, F(5, 80) =■ 1.69, £ > * 05 - This lack of a significant interaction 
indicates no difference in the serial-position curves for the signed and fin- 
gerspelled trials. To complete the comparison of dynamic and static condi- 
tions, an ANOVA was performed on the printed and signed trials; the results 
showed a significant condition X position interaction, F(5, 80) - 3-66, £ < 
.01. As is evident in the figure, the deaf subjects were never at ceiling in 
their recall performance. 

Taken together, these analyses indicate that the condition X position 
interaction *.n the original analysis was due to differences between the print 
condition on the one hand and the f ingerspelling and sign conditions on the 
other. This is consistent with the hypothesis that recall of dynamic and 
static forms of linguistic information produces different serial-position 
curves. In order to localize the effects of dynamic vs« static input on the 
serial-position curve, contrasts were done at each serial position, going back 
to the original analysis, by comparing recall performance in the static condi- 
tion (print) with that in the dynamic conditions (f ingerspelling and signs). 
The contrast was significant at Position 1, F(1, 34) ■ 10.49, £ < .01 and 
Position 2, F(1 , 34) - 4.99, £ < .05, with accuracy greater in the print 
condition than in the other two conditions. The contrast was also significant 
at Position 5, F(1 , 34) - 10.05, £ < .01, and Position 6, F(1 , 34) - 8.67, £< 
.01, with accuracy greater in the sign and finger spelling conditions than in 
the print condition. The contrast was not significant at Position 3, or Posi- 
tion 4 (both F f s < 1.00). These results indicate that there is a recency ad- 
vantage for the dynamic information (signed and f ingerspelled) but a primacy 
advantage for the static information (printed). The existence of some recency 
gains in all conditions probably reflects the relatively short list length and 
the freedom of subjects to record the items they remembered in any order they 
wished. 

To test specifically for the effects of signability on recall, a 3-way 
ANOVA was performed on the recall accuracy for the wi thin- sub jocts factors of 
signability (HS, LS) X mode ( f ingerspelling, print) X serial position (1-6). 
Because the group factor never entered into any significant main effects or 
interactions, native and nonnative subjects were pooled in this and subsequent 
analyses. The main effect of signability was nonsignificant (F < 1.00); thus, 
the availability of a direct sign translation for an English word did not en- 
hance its recall. Mean recall of all deaf subjects on the 6-Uem lists was 
3.16 for the HS stimuli and j.12 for the LS stimuli. The main effect of mode 
was also nonsignificant (F < 1.0), and the ANOVA revealed a significant 
mode X position interaction, F(5, 85) - 7.42, £ < .0001, reflecting tiie 
differences in serial-position effects between static printed input and dynam- 
ic f ingerspelled input. As in the previous analysis, the main effect of seri- 
al position was highly significant, F(5, 85) - 30.91, £ < .0001. 
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The analysis of signability indicated that if de subjects were using a 
sign-based code to recall English words, it was not to their advantage. How- 
ever, no evidence of sign-based coding of f ingerspelled or printed English 
words was obtained in an analysis of the intrusion errors. Two deaf native 
signers of ASL examined each error on the sign trials and on the HS 
f ingerspelling and print trials and judged whether or not each was formation- 
ally similar to the target item (i.e., a sign intrusion). Disagreements be- 
tween the two signers were rare (occurring on only 4 of the 63 errors that did 
not include misorderings or blanks) and when they occurred, they were resolved 
by consulting a vocabulary book on ASL signs (O'Rourke, 1978). Error analysis 
of the sign trials showed that of the 63 errors, 30 were sign intrusions. The 
results of the perceptual task indicated that these sign intrusions were not 
due to perceptual confusions. (Many of the remaining errors consisted of 
words that were formationally similar to a word in another position in the 
same list.) Table 1 lists examples of sign intrusion errors and the corre- 
sponding target signs for the same serial positions in the recorded list of 
signs. In contrast, errors made on the f ingerspelled and printed English 
conditions did not tend to be sign intrusions. The 79 errors on the HS trials 
(not counting misorderings and blanks) included only a single response that 
had a sign similar to that of the target item. This was the intrusion of 
"caution" for "warning," which is also semantically related. The other 78 er- 
ror could not be differentiated in kind from errors on corresponding LS print- 
ed and LS f ingerspelled lists. Errors made on f ingerspelled and on printed 
lists appeared to be of the same general type, as indicated by the examples of 
errors on HS lists provided in Table 2. Patterns of visual resemblance of 
item and error pairs are obvious. Such errors could reflect, either visual or 
phonological confusions; the present experiment was not designed to distin- 
guish between these two possibilities. Ta?en together, these results suggest 
that well-educated deaf signers employ sign-based coding in retention of ASL 
signs but not in retention of English words, whether printed or f ingerspelled. 

Finally, recall accuracy of the deaf subjects on the printed trials was 
compared with that of the hearing subjects. Collapsing the data across all 
deaf subjects, mean recall on the six-item printed blocks was 3.14. (It 
should be remembered that for the deaf subjects, mean recall did not differ 
significantly as a function of condition: average recall on the fingerspel- 
ling and sign conditions was 3.10 and 3.17, respectively.) Mean recall of the 
hearing subjects on the printed blocks was 4.87, and many of them were at 
ceiling. An analysis comparing mean recall of the deaf subjects with that of 
the hearing subjects indicated that there was a significant difference in the 
accuracy of subjects as a function of group (deaf or hearing), J;(26 ) - 6.85, P 
< .0001. No valid tests of parallel serial position differences could b used 
due to the ceiling performance of so many hearing subjects. 

Discussion 

In the present experiment, there was no significant difference in per- 
formance between the native and nonnative signers tested. This suggests that 
native signers and nonnative signers who learned ASL at an early age form a 
homogeneous subject group; as far as these tasks are concerned, ASL functions 
as a primary language in the same way for both. 

Serial-position effects were examined in order to test the dynam- 
ic-presentation hypothesis against the primary-language hypothesis by compar- 
ing deaf signers 1 recall of English print, f ingerspelli'ig, and ASL signs. The 
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Table 1 



Item and Error Pairs in Recall of ASL Signs 



Target Item 


Intrusion Err or 


Parameter^ of Ditference 


danger 


algebra 


movement 


zero 


photograph 


handsha pe 


tvlegram 


declination 


hand' pe 


secret 


patience 


movement 


debt 


this 


movement 


instructions 


foeskating 


handshape 


pope 


princess 


movement, location 


fence 


screen 


har.dshape, location 


rosary 


interpreter 


movement, location 


sandwich 


school 


movement, location 



Table 2 



Item and Error Pairs in Recall of Fingerspelled 
and Printed English Words 

FINGERSPELLING rRINT 



Target 


Error 


Target 


Error 


diamond 


aiuond 


hrart 


horse 


wrestling 


recycling 


concept 


corn 


ceremony 


cemetery 


leaf 


leatner 


pipe 


pope 


interference 


inference 


bomb 


bubble 


rosary 


rosemary 


noon 


noun 


digit 


dignicy 


temptation 


temperature 


outlaw 


outline 


vinegar 


vineyard 


cure 


burn 


cure 


sure 


antique 


unique 
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results revealed that the serial-position curves were similar for the two 
Vpes of dynamic stimuli (fingerspelling and signs) and that these curves dif- 
fered from those obtained foi the static stimuli (print). Recall was better 
for dynamic stimuli in the last two serial positions but worse in the first 
two serial positions. 

The recency advantages found for f ingerspelled English words and ASL 
signs add to a growing body of results indicating that "modality effects" can 
be obtained even ir the absence of acoustic input (Campbell & Dodd, 1980; 
Campbell et al., 1983; Engle et al. f 1982; Noxrne & Walters, 1983; Shand 
1980). However, the present results are inconsistent with the primary-lan- 
guage hypothesis, according to which differences would have been expected be- 
tween the serial-position curves for the primary-language items (ASL signs) 
and those for the nonprimary -language items (fingerspelling and print). Rath- 
er, the present findings provide support for the hypothesis that the "Hi dality 
effect" is a reflection of a recency advantage that accrues to dynamically 
presented information, regardless of input modality. The primacy advantage 
found for printed stimul." over f ingerspelled and signed stimuli resembled the 
primacy advantage for printed over lipread and mouthed stimuli reported in 
previous studies (Campbell & Dodd, 1980; Nairne & Walters, 1983). As men- 
tioned earlier, the comparison between hearing subjects' recall of spoken and 
of printed words reveals only a recency difference between the two conditions, 
and consequently, an overall advantage for the spoken words. But it appears 
that in spite of the recency advantage for nonacoustic dynamic stimuli (e.g., 
signs and lipread, mouthed, and f ingerspelled words), such stimuli show no 
overall advantage over static stimuli (printed words). ;„iat is important to 
note in all of these studies is that dynamic information (whether spoken, 
signed, fingerspelled, etc.) ari static information (printed) yield different 
serial-position curves. 

4s in previous research (Bellugi et al. , 1975), analysis of the deaf sub- 
jects 1 intrusion errors revealed sign-based coding of the ASL signs. However, 
the lack of sign intrusion errors on both printed and fingerspelled English 
lists suggests that w? 11- educated deaf persons do not recode English words in- 
to signs. In addition, there was no recall advantage for those English words 
that have direct sign translations. These results are especially noteworthy 
because they suggest that deaf bilinguals can change their recall strategies 
depending upon whether they are presented with information in English or in 
ASL. 

The number of items recalled by deaf signers did not differ as a function 
of language, signability, or dynamic-static differences. But their mean re- 
call was significantly less than Uiat of hearing subjects when the performance 
of both groups on the printed trial3 was compared. These results are not con- 
s' stent with the view that the generally poorer performance on serial-recall 
tasks by deaf tibjects than by hearing subjects stems from the requirement to 
remember English. In conjunction with earlier findings that deaf signers per- 
form as well as hearing individuals on free-recall Usks involving English 
stimuli (Hanson, 1982), the present study indicates a specific difficulty on 
the part of the deaf signers wxth serial-order recall. 

It is important to realize that difficulties deaf individuals may have 
with serl .1-recall tasks need not interfere with their primary-language 
abilities in ASL because of ASL's emphasis on si, itaneous production of 
linguistic unita. But aerial-recall performance may become a problem when 
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deaf individuals learn a spoken language. English, even more than some other 
spoken languages, relies heavily on word order in syntactic structuring. Not 
surprisingly, deaf children have difficulty in learning to read and write the 
complex syntactic structures of English, which place a heavy load on memory 
for ordered units (Russell, Quigley, & Power, 1976), and deaf individuals usu- 
ally do not read as well as their hearing peers (Bornstein & Roy, 1973; 
Karchmer, Milone, & Wolk, 1979). If we are to improve our methods for teach- 
ing deaf persons to read and write, it is crucial that we gain more insight 
into the strategies that deaf individuals bring to bear when remembering En- 
glish letters, words, and sentences, and the ways in which deafness affects 
the perception of and memory for sequential flow of linguistic information. 
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Sequential structuring does, of course, play a role in ASL, much a3 
simultaneous structuring does in speech. The essential difference is in the 
extent to which sequential structure or parallel structure is part of the ab- 
stract organization of the language. Studdert -Kennedy and Lane (1980) suggest 
that speech draws on parallel organization (coarticulation, for example) to 
implement an abstract sequential linguistic structure, while ASL draws on 
sequential organization of its gestures to implement an abstract parallel 
linguxstic structure. For example, in ASL the formation of a sign's handshape 
may precede the start of its movement. Clearly, there is also a sequential 
component in ASL syntax. 
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DID ORTHOGRAPHIES EVOLVE?* 
Ignatius G. Mattinglyt 



Abstract , According to Gelb (1963), writing has "evolved" from pic- 
ture writing to logography to syllabic writing to alphabetic writ- 
ing. It is argued here that this widely accepted theory of ortho- 
graphic evolution does not really fit the historical facts very 
well, and that the variety of orthographies is better explained on 
linguistic grounds. Orthographies have to be productive, and they 
can manage this only by providing devices for transcribing the 
possible words in the lexicon. The ver/ limited number of different 
ways in which this is accomplished in different orthographies is 
accounted for by the structural peculiarities of the languages that 
the orthographies transcribe. 

It is generally believed by linguists, psychologists, psycholinguists and 
educators that writing has "evolved." First there was picture writing, then 
came logographies, then syllabaries, and finally, the alphabet. At each of 
tliese stages of development, writing became more efficient, because a funa^er 
inventory of signs was required to do the job. The alphabet is the culmina- 
tion of this evolutionary process, and its nearly universal triumph over less 
efficient orthographies has been well deserved. 

The evolutionary view of writing probably originated during the nine- 
teenth century, when most of the uecipherments that led to our present knowl- 
edge of ancient writing systems took place, and theories of cultural evolu- 
tion, inspired by the theory of biological evolution, were in vogue. The 
evolutionary view can be found in one form or another in many of the standard 
accounts of the history of writing. Thus Jensen (1970): 

In the broader history of writing we can see then certain evolution - 
ary tendencies emerging. Above all it is governed by the law of 
least resistance , according to which every change must in the normal 
way run from the more ditficu.t to the more easy, from the more 
complicated to the more simple; we find, furthermore, in keeping 
with the general development of civilisation, an increasing abstrac - 
tion , a certain assimilation of the form to the self-increasing 
intellectuality of the content, (p. 22) 

(Cf. also Pedersen, 1962, chap. VI.) And the evolutionary view has been 
elaborated into a theory by Gelb, whose A Study of Writing (1963) most of us 
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who are interested in the psychology of reading turn to for enlightenment 
about the natural history of writing. 

Gelb says that "writing had its origin in simple pictures" (p. 190), ad- 
vanced to "semasiography" (that is, picture writing), and then to "phonogra- 
phy," which comprehends word-syllabic, syllabic, and alphabetic writing 
(p. 191). The development of writing is said to be "unidirectional" (p. 200): 

What this principle means in the history of writing is that in 
reaching its ultimate development writing, whatever its forerunners 
may be, must pass through the stages of logography, syllabography , 
and alphabetography in this, and no other, order. Therefore, no 
writing can start witn a syllabic or alphabetic stage unless it is 
borrowed , directly or indirectly, from a system which has gone 
through all the previous stages. A syscem of writing can naturally 
stop at one stage without developing farther. Thus a number of 
writings stopped at the logographic or syllabic stage, (p. 201) 

Thus, just as biological evolution explains the variety of natural species, 
orthographic evolution is said to explain the variety of orthographic species. 

What I wish to do here is to reconsider the theory of orthographic evolu- 
tion. I will arfcue thai the evolution of writing haa been more apparent than 
real, and that the variety of orthographic species is better understood from a 
standpoint more linguistic than Gelb adopts. The alphabet, I will suggest, <s 
not necessarily the best way to write all languages. For the evidence that 
leads to these conclusions, I rely mainly cn the remarkable erudition of Gelb 
himself. 



Is this a matter of more than marginal concern for the psychology of 
reading and spelling? I suggest that it may be, for the evolutionary view is 
echoed by psychologists concerned with the reading process (Crowder, 1982, 
p. 148; Henderson, 1982, p. 7), and the supposed evolution of writing is some- 
times taken to reflect psychological facts and even to suggest teaching strat- 
egies. Citing Gelb (1963), Gleitraan and Rozin (1977) say: 

...each orthography arose as a gradual refinement and generalization of 
resources already implicitly available in its predecessors, as though the 
early scripts formed the necessary conceptual building blocks required 
for further development ... .On these grounds, one can build a plausibility 
case (though only that) for organizing reading instruction in terms of a 
sirr'lar accumulation of conceptions: perhaps ontogeny recapitulates 
cultural evolution, (p. 8) 

Let us begin wit*i the claim that logography evolved from picture writing. 
There are seven ancient traditions of logographic writing: the Mesopotamian , 
Proto-Elamite , Proto-Indic, Sino-Japanese, Egyptian, Cretan, and Hittite. 
Decipherment has not progressed very far in the cases of Proto-Elamite, 
Proto-Indic, and the early Cretan writing, but in the case of the other logo- 
graphic traditions there is evidence that the signs were at first iconic 
(Gelb, 1963, chap. II t only later becoming arbitrary and non-iconic. The 
obvious explanation for this development is that while iconic signs were suit- 
able for monumental inscriptions, hieratic, commercial, and literary uses re- 
quired signs that could be rapidly written rather than slowly drawn. There 
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was thus an evolution from iconic to non-iconic writing. But regardless of 
their graphic form, the signs were from the beginning logograms: they stood 
for words (or more correctly , morphemes ) , not , as is sometimes said, 
"concepts" or "meanings." An iconic sign designated a particular word by sug- 
gesting some aspect of its meaning, but the meaning of the logographic fxt 
did not depend on these pictorial hints, but on the selection and ordering of 
the words , just as it does in spoken and written language in general. 
Non-iconic signs, arbitrarily associated with words, served the purpose equal- 
ly well. 

Picture writing, on the other hand, is non-linguistic. The term is a 
convenient cover label for a fascinating miscellany of assorted artifacts from 
preliterate societies: rock-drawings warning of danger nearby, pictorial 
"letters," narratives and proverbs, tribal and commercial identification 
symbols, calendar systems, and so on (Gelb, 1963, chap. II). 

In what sense can logography be said to have evolved from picture writ- 
ing? The claim would have some substance if it could be shown that the signs 
of some logography were borrowed from or paralleled those of a particular 
tradition of picture writing, but there appears to be no example of this sort 
in any of the logographic traditions. The Mesopotamian Sumerians used both 
cylinder seals and logographic writing on commercial identification tags, but 
there is no relationship between the seals and the writing (Gelb, 1963, 
p. 65). If cultural evolution means anything, it must imply some kind of 
structural development: thus the computer can reasonably be said to have 
evolved from the loom. But linguistic writing merely took over the 
communicative functions of picture writing, as the internal combustion engine 
took over the locomotive functions of the horse; it did not, in any interest- 
ing sense, evolve from picture writing. 

The second part of Gelb's theory is that ayllabaries evolved from 
logographies. This claim implies that within a particular orthographic tradi- 
tion, there is a period of strictly logographic writing, then, perhaps, a 
transitional period, and then a period of strictly syllabic writing. But what 
we actually find, in the Mesopotamian, Hittite and Sino-Japanese traditions 
(Egyptian will be discussed shortly) is just the transitional period. 

The writing in these traditions is what Gelb aptly calls "word-syllabic" 
writing, in which logograms and syllabary signs supplement each other. Thus, 
in Sumerian and in Japanese writing, the syllable signs are used regularly to 
write *lectional morphemes and can also be used to write base morphemes. 
Alternatively, a base morpheme can be written with a logogram, and in this 
case, a supplementary syllable sign is sometimes used to indicate the phono- 
logical form of the morpheme. In Chinese writing, some of the characters are 
simple logograms, but most of them consist of two component signs: the "radi- 
cal," one of 214 signs that serve as semantic classifiers, and the "phonetic 
complement," a sign that in isolation has a phonological value similar or 
identical to that of the compound character. The compound character for 
/kUj/, blind , for instance, ia composed of the simple signs for /ku 3 /, drum 
and /mu k / , eye (Jensen, 1970, p. 170). 1 Since the phonetic complements have 
logographic values of their own, and there are in general quite a few phonetic 
complements for a particular syllable (10 for for example; Wieger, 

1927), it might seem a bit eccentric to regard Chinese writing as systemati- 
cally syllabic, rather than simply as a case of massive phonetic transfer. 
But the fact that a common error in the writing of Chinese is the use of an 
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incorrect but phonologically accurate phonetic complement (H.-B. Lin, personal 
communication) attests to the psychological reality of the syllabary system. 

In all these word-syllabic orthographies, the syllable signs clearly 
derive from logograms. Thus the syllable sign for /gal/ in Sumerian derives 
from the logogram for /gal/, great (Gelb, 1963, pp. 110-111); one of the 
phonetic complements for /ku 3 / in Chinese, as we have seen, derives from the 
logogram for /ku 3 /, drum ; and the Japanese kana for /mo/ derives from the 
character for /mo/, hair , borrowed from Chinese /mao 2 /, hair (Jensen, 1970, 
p. 201). But is derivation necessarily to be equated with evolution? Gelb 
himself makes it quite clear that there is no period in any of these tradi- 
tions during which the writing was strictly logographic; syllable signs occur 
in the earliest specimens (Gelb, 1963, pp. 67, 83, 85). Nor did any of these 
traditions lead eventually to a strict syllabary, though 3ome of the later 
Mesopotami an systems came fairly close (p. 165). 

The Cretan tradition is perhaps the one case that supports the claim. 
Whether there was a strictly logographic stage cannot be determined until the 
early Minoan scripts are deciphered, but the strictly syllabic Cypriote 
orthography appears to have developed from the earlier word-syllabic stage 
represented by Cretan Linear B (Gelb, 1963, p. 154). 

Finally, Gelb's theory claims that alphabetic writing evolves from sylla- 
bic writing. But this part of the theory depends crucially on Gelb's particu- 
lar interpretation or tne structure of the Egyptian and We3t Semitic 
orthographies, and on his presumption that the latter derive from the former. 

In the Afro-Asiatic family of languages, to which both Egyptian and 
Semitic belong, the base morphames are, in general, simply consonantal pat- 
terns, for example, Egyptian n-f-r, lute ; p-r, house ; and Semitic k-t-b, to 
write ; m-l-k, to rule . In actual words, vowels are morphologically inserted 
and, together with prefixes and suffixes, distinguish the various forms de- 
rived from the base. Thus the base k-t-b yields in Hebrew [ka'tav], h£ urote ; 
[jix'tov], he will write ; [jik'atev], he will be inscribed ; [mix'tav], letter ; 
[ktu'ba] marriage, and many other forms. 

Egyptian writing is a mixture, often redundant, of logograms and signs 
for consonants anc for sequences of two consonants. These consonantal and 
biconsonantal signs are derived from the logograms by phonetization. Thus the 
sign for d-t, snake , is used for the consonant / d/ , and the sign for /w-r/, 
swallow , is used for the consonantal sequence /w-r/ in writing /w-r-d/, to be 
weary (Jensen, 1970, p. 60). There are no obviously syllabic signs. Vowels 
are not ordinarily indicated, but in special cases, such as foreign proper 
names, the signs for the consonants /?/, /j/, /w/ are used for vowels /a/, 
/i/, /u/, respectively. This assignment of consonantal signs to vowels is not 
arbitrary. /j/ is homorganic with /i/ and /w/ with /u/. While /?/ is not 
homorganic with /a/, it is nevertheless phonologically reasonable to 
transcribe the low back vowel with the sign for the glottal stop, the lowest 
and most back consonant. As with Sumerian and Chinese, there appears to be no 
historical period during which the writing is strictly logographic; the 
consonantal signs are there from the first (Gelb, 1963, p. 7 1 *) . 

Ancient Semitic writing consists simply of signs that ordinarily stand 
for single consonants: thus Kebrew [ka'tav] is written ktb , and [mix'tav], 
mktb. But as with Egyptian, consonantal signs are used, when necessary, to 
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indicate vowels: the signs for /j/, /w/, aleph, yod and waw, could indi- 

cate /a/; /i/ or /e/; and /u/ or /o/, respectively. This device was used not 
only for proper names: [da'wid], Dav id , being written dwjd, but also to avoid 
ambiguity in other words, [jix'tovj being written jktwb to distinguish it 
from [jik'atev], written jktb. 

Pace Gleitman and liozin, it was surely not the case that the West Semites 
didn't "notice" the vowels in their language (1977, p. 19): when it was im- 
portant to write the vowels, they wrote them. On the contrary, what is espe- 
cially significant about the Afro-Asiatic languages is that their morphologi- 
cal structure must have fostered awareness of segmental structure to a far 
greater degree than in the case of Indo-European languages. As I have argued 
elsewhere, such "linguistic awareness" is not automatic and is essential for 
alphabetic reading and writing (Liberman, Liberman, Mattingly, & Shankweiler, 
1980; Mattingly, 1972). 

The reason that both Egyptian and Semitic could be written without con- 
sistent indication of vowels is that, in general, the vowels carried only 
inflectional information. Since word-order is relatively fixed, this informa- 
tion is for the most part redundant. On the other hand, in Greek and in 
Indo-European languages generally, the base morphemes include vowels. Thus, 
when the Phoenician alphabet was adapted to Greek, it became a p lene alphabet: 
vowels as well as consonarts were regularly transcribed, alep , yod, and waw 
being used for /a/, /i/, and /u/ as tofore, and three other Phoenician 
consonantal signs, he, /h/, heth, /h/, and ayin, /<7, for /e/, /e/ and /o/, 
respectively. 

To maintain his theory of orthographic evolution, Gelb has to argue, 
since there are no preceding West Semitic logographies or syllabaries, that 
the West Semitic scripts derive from the Egyptian. And since he denies .he 
direct Gevelopment of an alphabet from a logography, he has to argue that the 
Egyptian consonantal and bi consonantal signs are really syllabic. 

In asserting the derivation of the West Semitic script from the Egyptian, 
Gelb very properly rejects the far 3tched attempts of other scholars to 
demonstrate similarities in the forms of the signs of the two f-ripts. His 
argument relies on the similarity of "inner structure" (p. 146), that is, the 
use of a limited set of signs to express consonants but not (ordinarily) vow- 
els. But this argument loses what force it might have in view of the fact 
that it is the same peculiarity in morphological structure that mode it possi- 
ble for both languages to be written in this way. Gelb might have adduced a 
further similarity of inner structure: when vowels did have to be written, 
the signs for the same three consonants, /?/, /j/, and /w/, were used to write 
the same three vowels, /a/, /i/, and /u/. But the similarity of Egyptian and 
West Semitic phonological inventories explains this. Since both had the con- 
sonants /?/, /j/, /w/ phonologically related to the vowels /a/, /i/, /u/, 
respectively, the signs for these consonants were the obvious choices to write 
the corresponding vowels. Though the possibility cannot be ruled out, there 
is no need, in the absence of other evidence, to conclude that West Semitic 
script is derived from Egyptian script. The linguistic similarity of the 
Egyptian and Semitic languages is quite sufficient to account for the 
similarity of the two scripts. 
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As for the Egyptian consonantal signs, Gelb ! s proposal is that each of 
them represents a set of s/llables or disyllables with the same consonants but 



without a difference, but for Gelb it is crucial: 

The Egyptian phonetic, non-semantic writing cannot be consonantal, 
because the development from a logographic to a consonantal writing, 
as generally accepted by Egyptologists, is unknown and unthinkable 
in the history of writing, and because the only development known 
and attested in dozens of various systems is that from a logographic 
to a syllabic writing, (pp. 78-79; original in italics) 

But, obviously, this argument is entirely circular; only the theory itself 
justifies the syllabic interpretatic. . One might have supposed that the West 
Semitic scripts, at least, could be allowed to be alphabetic without damage to 
the theory, but to concede this would obviously undermine the claim of inner 
structural similarity between them and the Egyptian script. Thus the West 
Semitic script must be syllabic, too, waw , for example, being transliterated 
wa , wi, wu (Gelb, 1963, p. 1^8), and the development of alphabetic writing 
must await the Greeks, 

This claim is not only uncorroborated; it also makes it much more diffi- 
cult to account for the emergence of the Greek plene alphabet. If the Phoeni- 
cian orthography was syllabic, there is no particular reason why the Greeks, 
any more than other Indo-Europeans , should have become aware of the segmental 
character of their language when they bv -rowed this orthography. We should 
expect to find them using, at least at first, a patched-up syllabary like that 
of the Persians. But if it is recognized that thr West Semites, thanks to the 
peculiar morphology of their language, had already arrived at the alphabetic 
principle, then the development of the Greek alphabet from the Phoenician al- 
phabet can be seen to be simply a matter of adding two more vowel signs and 
using them consistently. 

If we do not accept the claim for the development of West Semitic writing 
from Egyptian writing, and for the syllabic nature of at least the latter, 
then Gelb's theory is in trouble, for it would seem that, insofar as deriva- 
tion can be equated with evolution, an alphabet can evolve from a logography 
without an intervening syllabic stage, as in the case of Egyptian; and may 
even, perhaps, emerge without any precursors, as in the case of West Semitic; 
but that no alphabets have developed from syllabic or word-syllabic systems, 
for apart from the Ugaritic cuneiform alphabet, of unknown origin (Gelb, 1963, 
p. 129), all other alphabets are derived directly or indirectly, from the West 
Semitic consonantal alphabets. 

The theory of orthographic evolution cannot be correct, for logography 
cannot be shown to have evolved from picture writing in any meaningful sense; 
syllabaries do not generally develop from logographies; and alphabets do not 
develop from syllabaries. What we find instead are either logosyllabic tradi- 
tions: Mesopotamian , Hittite, Cretan, and Sino-Japanese; or alphabetic tradi- 
tions: Egyptian and West Semitic. We can, if we choose, regard as evolution- 
ary the development of non-iconic logograms from iconic ones, or the develop- 
ment of the Greek plene alphabet from the Phoenician consonantal alphabet, but 
these are not the sorts of evolution the theory calls for. 
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But without the theory, how can we account for the variety of 
orthographies? Let us consider this question from a rather different point of 
viaw. The orthography of a language must be productive ; that is, it must en- 
able the user to write any of the infinite number of possible utterances of 
the language. Because there are many levels at which an utterance is mentally 
represented in production and perception, there are, in principle, many possi- 
ble forms that a productive orthography might take. For example, any utter- 
ance of a particular language (in fact, any utterance of any language) can be 
written in a general system of phonetic transcription. If such a transcrip- 
tion were used as an orthography for all languages, any literate person could 
read aloud in any language. Or one could imagine an orthography that would be 
based on the acoustic properties of utterances (cf. the "visible speech" of 
Potter, Kopp, & Green, 19^7, and the stylized spectrographic patterns used for 
speech synthesis by rulo at Haskins Laboratories by Liberman, Ingemann, Lisk- 
er, Delattre, & Cooper, 1959); such an orthography would include ^ust the 
information on which the listener to spoken language relies. Or one could 
imagine an orthography based on the semantic representations of utterances 
(cf. Katz & Fodor , 1963), if indeed such representations really exist (Fodor, 
Fodor , & Garrett, 1975); after all, it is the meaning, not the linguistic 
structure, that the writer really wants to convey to the reader. But it is 
obvious that none of these alternatives would do for a practical orthography, 
though it is not easy to say exactly why (see Mattingly, 1984, for some specu- 
lations) . 

There is in fact a very severe limitation on orthographic variety. In 
practical orthographies, only one basic principle has ever been used, that of 
transcribing utterances of a language as sequences of lexical items, that is, 
words. I vrculd argue that all known orthographies are in this sense lexical, 
varying only in the specific ways in which they happen to transcribe the 
words. The lexical character of logographies seems obvious, but it might be 
objected that alphabetic systems are essentially transcribing the phonemes of 
utterances, and only incidentally the words. With a well-behaved orthography, 
like that of Serbo-Croatian, only the spaces between the words indicate its 
specifically lexical character. The point becomes clearer in the case of an 
eccentric orthography, like that of English, in which there is usually more 
than one way to write a particular sound. Thus English Cay], phonologically 
/*/, can be written -igh- , -y , -y(-)e , i( -)e , -uy . But despite this variabil- 
ity, there is but one way of writing each of the words sight , try , lye , dyne , 
lie , lime , buy . 

A lexical orthography can only be productive if it incorporates a system 
for transcribing oil the words in the languages. There is, however, no 
principle that can specify Just the actual words of a language, and provide 
the basis for sucn a sy3tem. Thus /Cayf/, v., to gather truffles on Wf dnesday 
could perfectly well be an English word; it3 absence from the lexicon is 
accidental. Nor, since the membership of the lexicon, though finite in theo- 
ry, is indefinite in practice, would it be satisfactory simply to list all the 
words and provide an arbitrary sign for each. Any word that was inadvertently 
omitted, or entered the language after the list was compiled, would be 
unwriteable. And the writer who could not remember the sign for a word that 
was on the list would be driven to paraphrase. Thus there can be no strict 
logographies, for a strict orthography would not be productive; and according- 
ly no such stage is actually found in Sumerian, Egyptian, Hittite, or Chinese. 
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There is, however, a way to specify all possible words in a language. 
The phonetics and phonotactics of a language determine the set of phonological 
forms that qualify for membership in its lexicon. Thus, while /Sayf/ could be 
a word in English, and /kaet/ really is one, /AUc/ and /stwoyg/ could not be. 
By exploiting the phonological structure of the language, that is, by some 
form of phonetization, an orthography insures that any possible word can be 
transcribed. This does not mean that a writer will always know the standard 
way to write a particular word, or that the reader will always know what word 
is transcribed by a particular orthographic form. It does not preclude a 
particular word's being standardly transcribed in some exceptional or arbi- 
trary way, e.g., one. What it does mean is that if /Sayf/ should enter the 
English language, there will be at least one, in fact several, ways to write 
it; that the writer who cannot recall the standard spelling of cat can at 
least write kat, and that the reader confronted with a word unfamiliar in its 
written form will have a basis for guessing what the word is. 

Although lexical items have syntactic and semantic as well as phonologi- 
cal properties, only the last allow the specification of the set of possible 
words of a languag . Syntactic properties are not sufficient to specify dif- 
ferent words uniquely, and a principled characterization of word meaning has 
thus far eluded the efforts of linguistic semanticists (Fodor, 1977, chap. 5). 
As we have seen, however, semantic properties can nonetheless play a useful 
auxiliary role in orthographies. 

Every orthography, then, achieves productivity by incorporating some sys- 
tem for transcribing phono logically the possible words of the language. Since 
the only relevant phonological units are syllables and phonemes, there are re- 
ally only two ways to do this: the syllabic way and the alphabetic way, and 
we have seen that all orthographies make use either of the one or the other. 
But why must there be even two ways? Why are not all orthographies plene 
alphabets? The answer is that, to a large extent, the morphological and 
phonological structure of a language defines the orthographic options. There 
are some languages for which a plene alphabet would be cumbersome and redun- 
dant, and others for which there is no really satisfactory method of phoneti- 
zation. Moreover, the alphabetic option becomes an obvious one only under 
rather special linguistic circumstances. 

A Semitic language, unless it has borrowed heavily from a non-Semitic 
language, has no neod of a plene alphabet. Since lexical items are consonan- 
tal patterns, the vowels carrying only inflectional information, an extremely 
parsimonious system of phonetization is pos&ible, as the West Semitic 
orthographies demonstrate. Under similar linguistic circumstances, Egyptian 
writing was able to achieve productivity in much the same way. The extensive 
and often redundant use of logograms does not alter the fact that the 
uniconsonantal and biconsonantal signs are the true basis of this orthography. 

Because of their restricted syllable structure, Sumerian, Chinese and 
Japanese are less orthographically amenable. Japanese has only 7 4 phonotacti- 
cally possible syllables (or more exactly, moras). Chinese has about 1200 
possible syllables, but by no means all of them are actually used. Sumerian 
appears to have been similarly restricted. Restricted syllable structure 
surely promotes awareness of syllables, and in these cases a syllabary might 
seem to be the obvious phonetization device. But the morphological conse- 
quence of restricted syllable structure unfortunately, is pervasive homophony, 
exacerbated when, as in Chinese and Sumerian, the base morphemes are mostly 
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monosyllabic. For example, t.here are 38 different Chinese words with the 
phonological form /li H / (Wieger, 1927). UnJer these circumstances, a strict 
syllabary is hardly practical, for it would give rise to pervasive homography, 
far less tolerable in writing, because of the lack of prosociic information to 
he] p specify syntactic structure , than pervasive homophony in speech . For 
these languages, a word-syllabic system, in which the ambiguity of syllable 
signs is reduced with the help of logograms, is a reasonable, if not highly 
efficient solution. Alphabetic writing would be no improvement. To replace 
the syllabic signs in Chinese and Japanese writing by alphabetic ones would do 
nothing to reduce homography , and to use only an alphabet to write these 
languages, convenient though it might be for printers, would be disastrous for 
readers . 

For many o u her languages, a plene alphabet is the most efficient system 
of phonetization. But the alphabetic principle is not an obvious one. It did 
not occur to the Hittites, who used a word-syllabic system even though they 
did not have a homophony problem and could have used an alphabet. It occurred 
to the Egyptians and the West Semites only because the morphology of their pe- 
culiar character of the languages made them aware of phonological segments. 
It is certainly owing entirely to the West Semitic example that alphabetic 
writing is now so widespread. 

It would, however, be pressing the point too far to say that variations 
in linguistic structure account for all orthographic variety. Non-linguistic 
factors assuredly play a role. The Akkadians, for example, spoke a Semitic 
language and would certainly have been well advised to use a consonantal al- 
phabet. But being impressed by the culture of the Sumerians, they aaopled the 
Sumerian orthography and made writing unnecessarily complicated for themselves 
and their Mesopotamian successors (Jensen, 1970, p. 9H) . Greek speakers on 
the island of Crete i sed a word-syllabic system, Linear B, no doubt influenced 
by the example set by the speakers of the unknown Minoan language written in 
Linear A (Gelb, 1963, p. 91 ff.). The bewildering complexities of the 
Japanese kanji, borrowed from the Chinese, have a similar historical explana- 
tion (Martin, 1972). 

To summarise, Gelb's widely accepted theory of orthographic evolution 
must be rejected. Orthography has no relationship to picture-language, and 
there is no sequential development from logography to syllabary to alphabet. 
The forms that orthographies have taken are constrained by the requirement 
that they must be productive, and must transcribe lexical items. The limited 
variety of orthographies can be explained largely on linguistic grounds. 
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. HE DEVELOPMENT OF CHILDREN'S sensitivity to factors influencing vowel reading 



Danielle R. Zinna, t Isabelle Y. Liberman,tt and Donald Shankweilertt 



Abstract , To disambiguate vowel assignment to a vowel digraph in a 
word, readers must take into account aspects of the word context be- 
yond the vowel digraph units themselves. The present study examined 
the development of young readers 1 use of this context in two experi- 
ments. In the first experiment, first-, third-, and f if th-gr3de 
children were required to read aloud high- and low-frequency words 
containing vowel digraph units with variant and invariant pronuncia- 
tions. Words containing vowel digraph units with variant pronuncia- 
tions were further categorized by the uniformity of pronunciation of 
th:> vowel digraph-final consonant unit as it appeared In real words 
(i.e., the orthographic neighborhood consistency). 

While word reading accuracy of all groups was enhanced by word 
frequency, only the third and fifth graders demonstr .ted sensitivity 
to variation in pronunciation of the vowel digraph unit. For these 
children, low-frequency words containing vowel digraph units with 
invariant pronunciations were read with accuracy comparable to that 
obtained for the high-frequency words. In contrast, low-frequency 
words containing vowel digraphs with variant pronunciations were 
still a significant source of error for the older readers, but 
chiefly when they came from inconsistent orthographic neighborhoods. 

In a second experiment, pseudoword stimulus items were used to 
examine further the effect of the orthographic neighborhood on vowel 
pronunciation. The influence of the vowel digraph-final consonant 
unit in determining pronunciations was again indicated by limited 
variability in pronunciations of pseudowo^ds ending in particular 
vowel digraph-final consonant units from consistent orthograDhlc 
neighborhoods. Where there *as variability ir pronunciation, the 
initial consonant-vowel digraph structure appeared to be largely 
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responsible. Both experiments support the hypothesis that with 
reading experience, children identify the systematic relationship 
between pronunciation and orthographic structure and utilize that 
knowledge in the pronunciation of unfamiliar words. 

Analysis of the errors made by children as they acquire skill .1 word 
reading has provided some clues to the problems beginning readers encounter in 
identifying words. The well-documented finding that, in English, vowel mis- 
readings occur with greater frequency than consonant misreadings (Fowler, 
Liberman, & Shankweiler, 1977; Shankweiler & Liberman, 1972; Weber, 1970) sug- 
gests that beginners in English experience particular difficulty in associat- 
ing a given orthographic vowel unit with its appropriate pronunciation. 

A number of explanations have been proposed to account for the difference 
in difficulty between vowels and consonants (Fowler, Shankweiler, & Liberman, 
1979; Shankweiler & Liberman, 1976). One explanation emphasizes the differ- 
ences in the linguistic properties cf vowels and consonants in speech produc- 
tion and perception, noting that vowels are more fluid and generally less 
categorically defined than consonants (Liberman, Cooper, Shankweiler, & Stud- 
dert-Kennedy , 1967). Another explanation turns on the difference between 
vowel and consonant orthography. The preponderance of errors on vowels has 
been attributed to the fact that the same vowel may be spelled differently in 
diffe* *»nt words. Consonants, on the other hand, have a more nearly one-to-one 
correspondence between orthographic unit and phonological segment. The conso- 
nant letters, with few exceptions, cue the same phonological segments wherever 
they occur, whereas the letters that represent vowels frequently have multiple 
phonological referents (Venezky, 1967). Further support for the role of the 
orthography, rather than the differences in vowel and consonant perception, in 
accounting for the vowel error pattern is reported by Lukatela and Turvey 
(1980). In their examination of word reading errors in Serbo-Croatian, an 
orthography that includes a simple vowel set but a more complex consonant 3et, 
phoneme substitutions on medial vowel oegments were less frequent than 
substitutions on initial or final consonant segments. 

In view of the complexity of the English vowel orthography, it is hardly 
surprising that there are more vowel errors than consonant error3 in reading 
English words. In order to disambiguate the vowel pronunciation, readers must 
take into account aspects of the word contexts that are represented by the 
letters surrounding the vowels. Beginners* errors show that they have not yet 
learned *o do this, but use instead grapheme-phoneme correspondences for sin- 
gle vowel letters (Fowler et al., 1979 ). With age and experience, children 
narrow the range cf vowel renderings with greater and greater precision, tak- 
ing more account of the surrounding letter context (Fowler et al., 1979). 

In English, these surrounding letter contexts differ in the extent to 
which ^hey constrain the selection of the appropriate vo.*el. The context may 
be tightly constrained, as in the tense or long pronunciation for orthograpnic 
vowel units appearing in the context of the silent-e marker. Or it may be 
loosely constrained in a vowel digraph that may have several appropriate real- 
izations within a particular context. For example, the vowel digraph ou in 
the context of jgh may be correctly rendered as /au/ in bough , /A/ in tough , 
to/ in thought , tut in through or tot in though . In the Fowler et al. stud- 
ies, although the stimuli included a wide range of contextual constraints, the 
possibly differing effects among them were n^t considered. 
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Attempts to construct a model for predicting adults 1 pronunciations of 
pseudowords containing vowel digraph units (Johnson & Venezky, 1976; Ryder & 
Pearson, 1980) have suggested that the vowel pronunciation could be influenced 
either by the frequency of occurrence of that unit without regard to the con- 
text, that is, without regard to the effect of the final consonant or, alter- 
nately, by the context- provided by the final consonant. Results of those 
investigations support a model predicting that adult pronunciation is highly 
determined by frequency of orthographic patterns, but the functional unit is 
hypothesized to be the vowel digraph-final consonant structure. 

Skilled adult readers have in fact been shown to be sensitive to the con- 
sistency or inconsistency of the pronunciation of medial vowel-final letter 
units (Glushko, 1979). Glushko has proposed that, in the course of reading a 
word, an entire neighborhood of similarly structured words and their 
pronunciations is automatically activated in memory. Glushko's "neighborhood 11 
includes all monosyllabic words in the reader's lexicon that share the same 
medial vowel letters in combination with the same letter units in word final 
position. Rhyming words such as seam , beam , and '„eam , sharing both the medial 
vowel-final letter unit and a uniform prorunciation, would thus constitute - 
consistent orthographic neighborhood; whereas the words beat , threat , and 
great , although sharing the medial vowel-final letter unit, fail to share a 
uniform pronunciation, and thus would be classified as constituting an incon- 
sistent orthographic neighborhood. Glushko's adult readers' performance was 
influenced by the consistency or inconsistency in orthographic neighborhoods 
as evidenced by more rapid reading and more limited variation in pronunciation 
of words and pseudowords from consistent orthographic neighborhoods (i.e., 
words of similar structure 3haring a uniform pronunciation). It was also 
indicated by a greater latency of response and significant variation in 
pronunciation of words from inconsistent orthographic neighborhoods (i.e., 
words of similar structure that fail to share a uniform pronunciation). 

The vowel digraph unit in many words may be ambiguous unless the reader 
can exploit additional cues from the other letters in the word. The broader 
context, as for example, the final consonant, may supply such cues. Whether 
or not it does could depend on whether word items from an orthographic 
neighborhood for that vowel digraph-final consonant unit share a consistent 
pronunciation. Thus, the final consonant might be used to disambiguate the 
vowel digraph, but its use would involve a complex context-sensitive opera- 
tion. 

A study examining this skill in second-, fourth-, and sixth-grade chil- 
dren (Johnson, *370) found that the facUr most likely to influence children's 
selections was also the frequency of occurrence of a particular pronunciation 
for a given unit, and further, that with increasing grade level, children's 
responses more closely reflected the pronunciations of those units as they ap- 
pear in real words. Though mention is made of seme additional effects of the 
final consonant context and the position of che vowel digraph unit within the 
word, the study was not designed to investigate the development of the influ- 
ence of context on children's selections as a result of reading experience. 
Nor did it examine the effects of the frequency of occurrence of the vowel di- 
graph-final consonant structure and the consistency of pronunciation of that 
structure in real words. 

To date, there has been no systematic study of the development of chil- 
dren's use of the final consonant context in disambiguating vowel assignment 
to vcwel digraph units and their sensitivity to orthographic neighborhood con- 
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sistency. An examination of these effects with children may provide insight 
into the development of childrer's awareness of the very complex relationship 
between the orthography and the phonology. In addition, it would also assist 
us in understanding how normally developing readers use the reading vocabulary 
they have mastered to develop strategies to identify unfamiliar words. 

In order to explore these questions, two experiments were conducted. In 
the first experiment, development of children's understanding of vowel digraph 
pronunciation was the focus. First-, third- , and fifth-grade children were 
required to read aloud high- and low- frequency words containing vowel digraph 
units with variant and invariant pronunciations. For each grade, an examina- 
tion of error rate and of the characteristics of errors was conducted to ex- 
plore the effects of word frequency, of alternate pronunciations for vowel di- 
graph units, and of consistency of orthographic neighborhood on word reading 
accuracy. The second experiment investigated other influences on vowel di- 
graph reading using pseudowords containing vowel digraph units that have vari- 
ant pronunciations in words. By eliminating the factor of word familiarity, 
pronunciation preferences for vowel digraph units, as well as factors 
influencing those pronunciations, could be studied and the results compared 
with those obtained on the real word reading task. 



The subjects in the first experiment were children from the first-, 
third-, and fifth-grade classes of a suburban public school system in Connect- 
icut. Following a review of teacher ratings for reading achievement for the 
**irst and third graders, and teacher ratings and group reading achievement 
tests scores for the fifth graders, a pool of subjects, all average or above 
average readers, was identified. The final population consisted of 90 stu- 
dents, 30 from each grade level. The subjects participating in the second 
experiment were the 30 third-grade children who had participated in Experiment 
1. All subjects selected were n tive English speakers with no known hearing 
or vision impairments. 



The children were tested individually in two 30-min sessions. During the 
first session, the experimental word reading task was presented. The words 
were typed in lower case primary type on 4 n x 6 n file cards secured in a ring 
binder. The stimuli were presented in random order with 20 filler words, 
which were single syllable items selected from the reading subtest of the Wide 
Range Achievement Test (Jastak, Bijou, & Jastak, 1978). These filler words 
were included in order that the randomization satisfy the constraint that 
words with t Wo same vowel sound not* precede one another, thus minimizing 
possible prim ^ effects. Subjects were instructed to read each word orally 
and then to turn to the following card. Approximately two weeks aft**r the 
initial session, a second session was held for the third-grade children aring 
which the experimental pseudoword reading task was presented. Subjects were 
informed that these words were nonsense or "pretend" words and that they 
should not attempt to make real words out of the items. They were instructed 
to ree'\ each word orally and to turn to the following card after reading each 
word. All pronunciations were recorded on tape for later transcription and 
analysis. 



General Method 



Subjects 



Procedure 
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Experiment 1 

Materials 

Two lists of monosyllabic real words, including 72 items in all, were de- 
veloped. One list, as displayed in Table 1, included words containing vowel 
digraph units with invariant phonological correspondences, ee, oa, oi, ai, and 
ew. The words in the other list, as displayed in Table 2, included words con- 
taining units with variant correspondences, ea, ou, ow, ie, and oo. Words 
were selected to vary in two respects: frequency and variability of 
pronunciation of the vowel digraph unit. Frequency was ietermined by the 
occurrence of the words in reading material at the third-grade level as 
indicated in the American Heritage Word Frequency Listings (Carroll, Davies, & 
Richman, 1971). Classification according to variant or invariant pronuncia- 
tion was based on the pronunciations reported in a thorough listing (Fischer, 
1979) of monosyllabic English words containing vowel digraphs. Both word fre- 
quency and pronunciation variability were systematically controlled in both 
stimuli lists. 

In addition, as indicated in Table 2, for each monosyllabic word contain- 
ing a vowel digraph unit with a variant pronunciation, the word's orthographic 
neighborhood was determined from the Fischer bet in the manner of Glushko 
(1979). This determination was made for both high- and low-frequency words. 
Each word with a vowel digraph-final consonant unit that is always pronounced 
the same way in all monosyllabic words sharing that structure, was considered 
to have a consistent orthographic neighborhood. In contrast, each word with a 
vowel digraph-final consonant unit f hat is pronounced differently in at least 
cne other monosyllabic word shari^ hat structure was considered to have an 
inconsistent orthographic neighbors xi. 

Kesults and Discussion 

Because the variance in performance was substantially greater for the 
first graders than for the third and fifth graders, a separate analysis was 
carried out for each grade level group. Mean percentages of correct re- 
sponses, possible pronunciation responses, and error responses were calculated 
for each grade on each word category. These data appear in Table 3« The data 
for each grade were subjected to two separate factorial analysis of variance 
procedures. The first analysis examined factors of word frequency and 
pronunciation variability for the vowel digraph unit for the entire set of 
stimuli. In the second analysis the factors of word frequency and consistency 
or inconsistency of the orthographic neighborhood were examined for the vari- 
ant pronunciation set of words. 

Effe ts of Frequency and Vowel Digraph Pronunciation 

First graders . The analysis of the first graders* data revealed, as 
exoected, a significant main effect for word frequency, F(1,29) - 45.89, £ < 
.0001. As illustrated in Table 3, these children correctly identified 65t and 
63% of the high-frequency words containing vowel digraph units with variant 
and invariant pronunciations, respectively. T^us, it appears likely that the 
first graders employed a holistic word reading strategy. In contrast, identi- 
fication was correct for only 50JS of the low-frequeney words containing vowel 
digraph units \ *th invariant pronunciations and H3% of the low-frequency words 
containing vo f il digraph units with variant pronunciations. Thus, while these 
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Table 1 




Real-Word Stimulus Items with Invariant Pronunciations of the Vowel Digraphs 
(Experiment 1 ) 




High Frequency Low Frequency 




green sleek 
street breed 
road oat 
coal boast 
soil toil 
join joint 

paint ail 
main trait 
drew dew 
flew slew 




Table 2 




Real-Word Stimulus Items with Variant Pronunciations of the Vowel Digraphs 
(Experiment 1 ) 




Consistent Orthographic Inconsistent Orthographic 
Ne ighborhood Ne ighborhood 




High Low High Low 
Frequency Frequency Frequency Frequency 




beach ream read tread 
clean dean speak steak 




break teak 
head plead 




young mount mouth youth 
found spout touch uch 




group vouch 
proud soul 




tried fried owl flown 
piece niece how tow 




pie lied bowl Jowl 
field shield lew pow 




soon croon foot loot 
room sloop food hood 




good mood 
shoot soot 


ERJ.C 


266 

267 



Zinna et al. : Sensitivity to Factors Influencing Vowel Reading 



first-grade readers correctly identified nearly two-thirds of the high-fre- 
quency words containing vowel digraph inits with invariant pronunciations, 
they did not generalize that knowledge in assigning the correct pronunciation 
to identical vowel digraph units with invariant pronunciations embedded in the 
less familiar, low- frequency words. 

Further analysis suggests that the first-grade readers were nonetheless 
beginning to acquire an awareness of alternate pronunciations for vowel di- 
graph units with variant pronunciations. In reading high-frequency words con- 
tain ing vowel digraph units with variant pronunciations, first graders, as 
noted above, correctly identified 65$ of the words; however, 58$ cf their er- 
ror responses consisted of substitutions of possible alternate pronunciations 
for that vowel digraph unit. Error data obtained from their reading of 
low- frequency words containing vowel digraph units with variant pronunciations 
offer corroborative evidence for this finding. Although the overall error 
rate for reading low- frequency words containing vowel digraph units with vari- 
ant pronunciations was substantially greater than that obtained for the 
high-frequency words, 53? of these errors (again greater than one-half of the 
total) consisted of substitutions of possible alternate pronunciations for the 
vowel digraph unit. 

Thi.'d graders . As was the case for first graders, analysis of 
third-grade data again revealed a significant main effect for frequency, 
F(1,29) - 55.^6, £ < .0001. In addition, a significant main effect for 
pronunciation for the vowel digraph unit, not present in the analysis of the 
first-grade data, was obtained with the third graders, F(1 ,29) - 59.93, £ < 
.0001 . As illustrated on the left in Figure 1, an interaction between word 
frequency and pronunciation for the vowel digraph unit was obtained, F(1,29) « 
23-54, £ < .0001. 

Like the first graders, the third-grade readers read high-frequency words 
containing vowel digraph units with variant and invariant pronunciations 
equally well, though with greater accuracy than the first graders, correctly 
identifying 96$ and 98$ of words in these rategories, respectively. In con- 
trast to the first graders, the third graders read low- frequency words with 
invariant pronunciations for the vowel digraph unit with accuracy comparable 
to that obtained for the high-frequency words. They correctly identified 92% 
of the low- frequency words of that orthographic type, suggesting that they had 
been successful in identifying the systematic relationship between pronuncia- 
tion and orthographic structure among the words in their reading vocabulary. 
Less dependent on previous knowledge of specific words, the third graders 
demonstrated skill in generalizing knowledge of proper pronunciations of 
invariant vowel digraph units when those units appeared in the context of un- 
familiar, low- frequency words. 

In contrast to this performance on the invariant units, the third graders 
were able to read accurately only 79$ of the low-frequency words containing 
vowel digraph units with variant pronunciations. Nonetheless, their overall 
error rate in thi3 category (21$) was substantially lower than that of the 
first graders (57$). However, like the first-grade pattern, a majority of 
their errors (82$) consisted of substitutions of possible alternate pronuncia- 
tions for the vowel digrapn unit. As illustrated in Table 3» while tne error 
rate declined from the first to the third grade, the ratio of substitutions of 
possible alternate pronuciation3 to errors increased. Once again, the 
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Table 3 

Frequencies and Percentages of Correct and Incorrect Responses for Real Words 
Containing Variant and Invariant Vowel Digraph Units (Experiment 1) 

Variant Unit 





Grade 1 


Grade 


3 


Grade 


5 




Freq. 


% 


f?r» an 




Freq • 


< 

P 


High-Frequency Words 














Total Correct 


509 


65 


751 


96 


766 


98 


Errors 














Possible Pronunciations 


158 


20 


25 


3 


111 


2 


Impossible Pronunciations 


1.3 


15 


H 


1 


0 


0 


Low-Frequency Words 














Total Correct 


332 


H3 


618 


79 


712 


91 


Errors 














Possible Pronunciations 


235 


30 


133 


17 


57 


7 


Impossible Pronunciations 


213 


27 


29 


H 


11 


1 








Invariant Unit 






High-Frequency Words 














Total Correct 


189 


63 




98 


300 


100 


Errors 


111 


37 


6 


2 


0 


0 


Low-Frequency Words 














Total Correct 


1^9 


50 


275 


92 


296 


99 


Errors 


151 


50 


25 


8 


k 


1 



100 r 



90 



80 



70 
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Figure 1 . 

268 



Performance of third and fifth graders on reading low- frequency and 
high-frequency words, plotted in mean percent correct. 
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third-grade readers demonstrated skill in generalizing knowledge of pronuncia- 
tions for vowel digraph units to unfamiliar, low-frequency words. 

Fifth graders . The main effects for word frequency and variant versus 
invariant pronunciation for the vowel digraph u*iit were again revealed in the 
analysis of the fifth-grade data, F ( 1 ,29) - 38.40, £ < .0001, and F(1,29) - 
59.39, £ < .0001, respectively. As illustrated on the right in Figure 1, an 
interaction between frequency and pronunciation for the vcwel digraph unit was 
again obtained, F(1,29) * 26.51, £ < .0001. Though their performance was more 
accurate overall, the pattern of the fifth graders was similar in one respect 
to that of Doth earlier grades. That is, they read high-frequency words con- 
taining vowel digraph units with variant and Invariant pronunciations equally 
well, correctly identifying 98% cuiu 100J of the words of these categories, 
respectively. 

As observed previously with the third graders, the fifth graders success- 
fully identified the systematic relationship between pronunciation and ortho- 
graphic structure. Thus, they were a^le *,o generalize that knowledge to the 
identification of words of lower frequency containing these invariant units, 
correctly identifying 99% of the words of this category. As was the case with 
the third graders, the fifth graders 1 reading of low- frequency words contain- 
ing vowel digraph units with variant pronunciations was poorer than their 
reading of high-frequency words of that type: 91 % of the words of this cate- 
gory were correctly identified. Most of their errors (87%) consisted of 
substitutions of possible alternate pronunciations for the vd e digraph unit 
embedded within these words, a slightly greater percentage of such substitu- 
tions than in the third grade (82?). 

Summary. The analysis confirms the expectation that children's accuracy 
in word reading would be favorably enhanced by high word frequency, regardless 
of the number of alternate pronunciations for the vowel digraph unit contained 
within these words. In addition, the highly accurate performance of the third 
and fifth graders in reading low- frequency words containing vowel d graph 
units with invariant pronunciations supports the hypothesis that with reading 
experience, children identify the systematic relationship between pronuncia- 
tion and orthographic structure and utilize that knowledge in the pronuncia- 
tion of unfamiliar words. Finally, the increase in proportion of substitu- 
tions of possible alternate pronunciations among the errors, which increased 
with increasing grade level, provides fu. ther evidence that as children devel- 
op reading skill they identify the systematic relationship between pronuncia- 
tion and orthographic structure. 

Effects of Frequency and Orthographic Ne ighborhood Consistency 

A second analysis was conducted to examine the possibility that the error 
rate on categories of words that contained vowel digraph units with variant 
pronunciations was affected by the consistency of the orthographic neighbor- 
hood of individual words. Mean percentages of correct responses were 
calculated for each grade level group on each word category. These data ap- 
pear in Table 

First grariern . For the first grariprs. thp analysis rpvpalpri a signif- 
icant main effect for frequency, F(1,29) - 75.12, £ < .00(1. As indicated in 
Table 4, they correctly identified b2% and 68X of the high-frequency words 
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Table 4 

Mean Percentage of Correct Responsee for High- and Low-Frequency Words Con- 
taining Variant Vowel Digraph Units from Consistent and Inconsistent Ortho- 
graphic Neighborhoods (Experiment 1) 

Orthographic Neighborhoods 



Grade 



Consistent 



Inconsistent 
1 3 



Variant Unit 



High-Frequency Words 
t Correct 

Low-Frequency Words 
% Correct 



62 



98 
91 



99 
96 



68 
42 



95 
72 



97 
89 



from consistent arid inconsistent-orthographic neighborhoods, respectively. In 
contrast, they correctly identified only 44$ and 42JI of the low-frequency 
words from consistent and inconsistent orthographic neighborhoods, respective- 
ly. Once again, word frequency was the most predictive index of word reading 
accuracy. 

Third graders . Analysis of the third grade data also reveaJed a signif- 
icant main effect for word frequency, F(1,29) - 76.79, £ < .0001. However, a 
significant main effect for orthographic neighborhood consistency, not found 
in the analysis of the first-grade data, was also obtained, F(1,29) - 88.87, £ 
< .0001. As illustrated on the left of Figure 2, a significant interaction 
occurred between word frequency and orthographic neighborhood consistency, 
F(1,29) * 21.12, £ < .0001. Like the first graders, the third-grade readers 
read high-frequency words from consistent and inconsis. :nt orthographic 
neighborhoods equaHy well, though with greater accuracy than the first 
graders, correctly identifying 98% and 95% of words from these categories, 
respectively. When low-frequency words were pi evented, however, in contrast 
to the first graders' error pattern, those words from consistent orthographic 
neighborhoods were read with accuracy comparable to that obtained for the 
high-frequency words. The third graders correctly identified 91 % of the 
low- frequency words from consistent orthographic neighborhoods, in contrast to 
correct identification of only 72t of the low-frequency words from inconsist- 
ent orthographic neighborhoods. This result suggests that the third graders, 
but not the first grader3, have developed a reading vocabulary sufficient to 
provide a data base from which to determine the relations between orthographic 
structure and pronunciation. 
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Figure 2. Performance of third and fifth graders on reading low- frequency and 
high-frequency words with variant vowel digraph units, plotted in 
mean percent correct. 



Fifth graders . Main effects for word frequency and orthographic 
neighborhood consistency were once again found in the analysis of the 
fifth-grade data, F (1 ,29 ) - 37.^'i , £ < .0001, and F(1,29) - 33.29, £ < .0001, 
respectivelv. As illustrated on the right in Figure 2, a significant interac- 
tion between word frequency and orthographic neighborhood consistency was 
again obtained, F(1,29) - 9.64, £ < .0042. Though more accurate than the 
first and third graders, the fifth graders also read high-frequency words from 
consistent and inconsistent orthographic neighborhoods equally well, correctly 
identifying 99% and 97% of words of th.se categories, respectively. Like the 
third graders, the fifth graders, when presented with low- frequency words from 
consistent and inconsistent orthographic neighborhoods, read words from con- 
sistent neighborhoods with accuracy close to that obtained for the high-fre- 
quency words. They correctly identified 96% of the low- frequency words from 
consistent orthographic neighborhoods, as contrasted with correct identifica- 
tion of 89? of the low-frequency words from inconsistent orthographic 
neighborhoods. Once again, support is provided for the contention that the 
analysis of interword relations and awareness cf consistencies and 
inconsistencies between orthographic structure and pronunciation, in this case 
the vowel digraph-final consonant structure, provide the reader with the 
knowledge necessary to pronounce an unfamiliar word correctly. 
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Experiment 2 



The results of Experiment 1 provide evidence that older readers 1 accuracy 
and error rate in reading real words containing vowel digraph units with vari- 
ant pronunciations were influenced by the consistency of pronunciation of oth- 
er words sharing the particular vowel digraph- final consonant unit. To exam- 
ine this effect further- and to begin exploring the effect of the initial con- 
sonant-vowel digraph unit on pronunciation selection, a second experiment was 
conducted. In this experiment, the third-grade children who had participated 
in the first experiment were asked to read monosyllabic pseudowords containing 
vowel digraph units with variant pronunciations. By eliminating the possibil- 
ity of word familiarity, it was anticipated that factors influencing reading 
would be more unequivocally revealed. 



A list of 60 monosyllabic pseudowords was developed that contained vowel 
digraph units with variant pronunciations, ea, oo; ou, ow, and le. Each 
pseudoword consisted of initial and final segments that might appear in real 
words. The initial consonant-vowei digraph segment and the vowel digraph-final 
consonant segment in each of the pseudowords represented a legitimate sequence 
in English phonology. However, vowel digraph segments in the pseudowords 
might have different pronunciations in different real word contexts. For 
example, the ou unit in the pseudoword moung might be rendered like the ou in 
mouth or the ou in young . For pseudowords constructed in this manner, each 
item was reviewed to determine the consistency of pronunciation among monosyl- 
labic real words rearing the vowel digraph-final consonant unit. Of the 60 
items, 36 pseuaowords were determined to have consistent orthographic 
neighborhoods, as evidenced by the uniformity of pronunciation among monosyl- 
labic real words sharing the particular vowel digraph-final consonant struc- 
ture (Fischer, 1979;. The remaining 24 items were determined to have incon- 
sistent orthographic neighborhoods, as evidenced by the lack of uniformity of 
pronunciation among monosyllabic reel words sharing the particular vowel di- 
graph-final consonant structure (Fi3Cher, 1979). The final pseudoword lists 
are included in Tables 5 arcl 6. 



The pronunciation preferences of the 30 third graders for reading each of 
the pseudowords are listed as percentages ir Tables 5 and 6. Vowel di- 
graph-final consonant units, whic'i were determined to have consistent ortho- 
graphic neighborhoods because c? their uniform pronunciation monosyllabic 
real words, are listed in Table 5. Items determined to have inconsistent 
orthographic neif nborhoods, based upon Ihe lack of such uniformity, appear in 
Table 6. 

Influence of ^he Vowel Digraph-Final Consonant Unit 

It is evid~i'c from Tables 5 and 6 that pronunciations for pseudowords 
containing the v^uel digraph units oo and ea tended to vary with the designa- 
tion of their orthographic neighborhood as consistent or inconsistent. 
Pseudoword i f ems containing the units -ooth , -oc -oon , and -each , -ean , and 
-earn , all considered to have consistent orthogrnp, ic neighborhoods, were usu- 
ally pronounced as /u/ for the former and /i/ for the latter. These 
pronunciations occurred in never fewer them 90% of the cases. In contrast, 
?72 
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the units -ool , -ood, -ook , and -ead , -eat , and -eak , all considered to have 
inconsistent orthographic neighborhoods, were the source of considerable 
variation in pronunciation. Pseudowords containing the oo unit received the 
/u/ pronunciation in between 50? and 97? of the cases; items containing the ea 
unit received the /i/ pronunciation in between 60? and 97? of the cases. 



Table 5 

Percentages of Total Responses to Each Item from Consistent Neighborhoods by 
Vowel Digraph Pronunciation (Experiment 2) 

Consistent Orthographic Neighborhood 

Responses Other Responses 





/u/ 


/u/ 




or 


mooth 


90 


0 




10 


looth 


91 


3 




3 


troom 


97 


3 




0 


poom 


91 


3 




3 


shoon 


90 


3 




7 


smoon 


100 


0 




0 


woon 


93 


0 




7 




/i/ 


/ei/ 


/£/ 




meach 


94 


3 


0 


3 


slean 


97 


0 


3 


0 


chean 


97 


A 


0 


3 


team 


94 


3 


0 


3 


drief 


67 


33 




0 


tiece 


57 


40 




3 


criece 


60 


40 




0 


biece 


60 


37 




3 


f iece 


70 


27 




3 



Certain units elicited tlie greatest variation in pronunciation. For 
example, the realizations for the unit oo followed by k were evenly distribut- 
ed b^^'een /u/ and /u/. For each of these items, the initial ^ord segments 
moo- and zoo- -/ere words likely to be in a third grade child's re^di/.g vocabu- 
lary, however, the highly frequent words book and loc*, also , kely to be in 
a young child's reading vocabulary, provide the dominant pronunc ation for the 
unit -ook as it appears in monosyllabic real words. These factors, in addi- 
tion to these items' inconsistent orthographic neighborhood, may account for 
the pronunciation alternation. 
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Table 6 

Percentages of Total Resp ~*es to Each Item from Inconsistent Orthographic 
Neighborhoods by Vowel Digraph Pronunciation (Experiment 2) 



Inconsistent Orthographic Neighborhood 

Responses Other Responses 





/u/ 


/u/ 






or F 




8. 


7 






13 


smood 


,7 


3 






0 


tood 


73 


17 






10 


zook 


51 


13 






3 


mook 


50 


17 






3 




/i/ 


/ei ' 


/e/ 






stread 


60 


0 


10 




0 


olead 


80 


0 


17 




3 


chead 


77 


0 


23 




0 


st eat 


97 


0 


3 




0 


preat 


90 


3 


3 




3 


dreak 


70 


13 


10 




n 

i 


heak 


91 


0 


3 






treak 


91 


0 


3 




3 




/u/ 


/o/ 


/au/ 


/A/ 




touth 


30 


17 


13 


3 


7 


aouch 


0 


7 




13 


0 


fouth 


7 


10 




0 


23 




/au/ 


/o/ 








blowl 


53 


17 






0 


lowl 


3V 


60 






3 


snowl 


37 


63 






0 


f ow 


57 








0 


clow 


80 


17 






3 


arow 


13 


17 






10 


cown 


100 


0 






0 


hown 


97 


3 






0 
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Influence of t a Initial Consonant-Vowel Digraph Unit 

In contrast to the even distribution of pronunciation selections for both 
pse*doword items ending in -pok is the inconsistency in assignment of 
pronunciation to several other pseudoword items containing identical vowel di- 
graph-final consonant structures. For example, similar variation in 
pronunciation might be expected for the three items ending in -ead , a unit 
with an inconsistent orthographic neighborhood. Instead, the ea unit in the 
pseudoword dead was rendered as /i/ 80 % of the time; whereas in the item 
i tread , it was similarly rendered only 60% of the time. It seems likely that 
real words sharing the initial consonant-vowel digraph structure may be bias- 
ing the pronunciation of the pseudoword, but a final decermination must await 
further study. 

As indicated in Table 5 , the consistency of pronunc iation expected "or 
the ou unit in pseudowords ending in -oup, -oud, and -ound , considered to ha.e 
consi ^ten** neighborhoods and expected to be rendered as /u/, /au/, and /au/, 
respectively, was not obtained. It may be that the paucity of words ending in 
chose structures in a third-grade child's reading vocabulary reduced the sali- 
oncy of the vowel digraph-f ina. 1 consonant unit, allowing the initial word seg- 
ment to influence pronunciation. For example, the pseudoword proup and cloup 
were fcxpected to be rendered on the basis of the reader's knowledge of words 
such as soup and group . Instead, the ou unit was frequently rendered as /au/. 
As an explanation oi that result, we would suggest that word3 such as proud 
and clou «J, which share the exact ir,itial consonant-vowel digraph unit with 
proup ind cloup , may have been activated and contributed to the unexpected 
ore xiation. 

In view of that result, the apparent saliency of the /A/ Pronunciation 
for the ou unit in moung is particularly notable. Though that pronunciation 
occurs in English only in the single word, young , the ou unit embedded in the 
pseudoword moung received the /A/ pronunciation 70? of the tir.e, despite 
membership of the initial segment in a neighborhood containing mouth and moun- 
tain. In contrast, the other paeudoword item containing the oung unit, 
groung , received the /A/ pronunciation only 57 % of the time and the pronuncia- 
tion /au/ associated with the initial segment grou -, 50% of the time. 

Mixe d Influence 

Additional evidence for the possibility that pronunciation selections 
could be influenced b; he initial consonant-vowel digraph unit was revealed 
in the analysis of the ow unit in pseudowords. Any pseudoword containing t\ e 
ow unit, whether it ended a word or was combined with "1" as in -owl or "n" as 
in -own , was considered to have an inconsistent orthograpn *o neighborhood. 
Pronunciations of pseudowords containing the ow unit refle^ >ed that inconsis- 
tency, with the exception of the ow in the items cown and hown . The ow unit 
in these words was rendered as /au/ in 1 00% and 97 % of the cases, respective- 
ly. In each of these instances, the initial word segment consisted . p a 
morpheme, the r ^unciation of which was ,<ot overridden by the pror». nc iation 
inconsistency < * ie final unit -own . In addition, words likely to be present 
in a third-grade child's reading vocabulary, dow brown , and town , provide 
identical pronunciations for the ow unit and share the -own structure, prob- 
ably accounting for the consistent rendering of these items. 
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Pseudowords containing the vowel digraph unit _ie were all expected to re- 
flect th°ir consistent orthographic neighborhoods. The designation of consis- 
tency was based as alway ' on the uniformity of rendering of the vowel di- 
graph-final consonant unit in similarly structured real words. However, in 
the case of pseudowords containirg the ie unit, this detection of neighborhood 
consistency required that the reader respond to the affixation *f plural and 
past tense markers as a signal for the /ai/ pronunciation. The third-grade 
readers in this stud} were able to identify the unit in the pseudoword 
items kie and nle as /ai/; yet cheir pronunciations for similar items with the 
plural or past tense marker were variable. For example, the i£ unit in the 
items brles and f ied received the /ai / pronunciation in between 50 and 70 per- 
cent of the cases only. 

The ie unit in pseudowords ending in -ield , -lece , and -lef was expected 
to be pronounced as /i/ on the basis of knowledge of such words as field , 
piece , and chief . A review of the responses indicates that items ending n 
these units received the /i/ pronunciation in between 47? and 70$ of the 
cases. Evidently, pronunciation preferences are being influenced by experi- 
ence or instruction, but the design of the stimuli did not allow us to pin- 
point the source of the variation in pronunciation of the _ie unit in that con- 
text. 

Sunmary . The results of Experiment 2 provide support for the influence 
of the vowel digraph-final consonant unit in determining the rendering of the 
vowel in Eng' ish-like pseudowords. The influence of this unit could be seen 
in the greater uniformity of the pronunciation of pseudowords ending in 
particular vowel digraph-final consonant units from consistent orthographic 
neighborhoods. In instances where there was less uniformity in pronunciation 
juch items, the influence of the initial segment appears to account for 
most of the variability. 



Children's acquiaition of word reading skills was examint i with particu- 
lar emphasis on the development of young readers' respc ^e to variant 
vs. invariart phonologic associations for vowel digraph units, the use of the 
final consonant context in disambiguating vowel assignment to invariant vowel 
digraph units, and their sensitivity to the orthographic neighborhood consis- 
tency of that vowel digraph-final consonant structure. 

ihe data obtained in Experiment 1 indicate that the word reading accuracy 
of t ie first-grade children wis strongly affected Dy word frequency, but not 
by the variation in pronunciation of the vowel digraph unit. This finding 
supports che view expressed by Gough and Hillinger (1980) that initial 
acquisitic of wTd reading skills may typically be accomplished through rote 
iearni. ^ witn the result that frequently encountered words are jsually identi- 
fied without analysis of word components. 

The word reading accuracy of third and fif* graders was also affected by 
vorci frequency, but in addition, the older readers read low- frequency words 
containing ^ w el digraph units with inVi riant pronunciations with accuracy 
comparable to that obtained fcr the high-frv quency words. This effect is con- 
sistent with results of earlier studies (Fowler et al., 1979; Venezky & John- 
son, 1973; Venezky * Massaro, 1979) demonstrating children's ability to gener- 
alize knowledge of orthographic patte r.s bevond the words in which they were 



General Discussion 



276 



ERIC 




Zinna et ai.: Sensitivity to Factors Influencing Vowel Reading 



originally encountered. In contrast, low-frequency words containing vowel di- 
graph units with variant pronunciations were a significant source of error 
even for the older readers. 

When these low-frequency words were further categorized by consistency or 
inconsistency of their orthograpnic neighborhoods, those from consistent 
orthographic neighborhoods were read by the third and fifth graders with a 
level of accuracy close to that obtained for both high-frequency words and 
those of -low frequency that contained invariant vowel digraj. i units. For the 
children in the dgher grades, only the low- frequency words containing variant 
vowel digraph units with inconsistent orthographic ne-,,rtborhoods were a 
substantial source of error. These results provide support for a model in 
which the final consonant predicts vowel digraph pronunciation preferences 
(Johnson & Venezky, 1976; Ryder & Pearson, 1980). They also support the hy- 
pothesis (Glushko, 1979) that the ability to read the vowel in words is 
affected by the consistency of pronunciation of words sharing a particular me- 
dial vowel-final letter unit. Despite come exceptions, these findings speak 
to the special salience of the vowel digraph- final consonant unit in 
disambiguating vowel pronunciation. 

In the second experiment, pseudoword stimulus items were used to allow us 
to explore further the influence of the neighboring orthographic segments on 
vowel pronunciation. It was found that whereas the orthographic neighborhood 
consistency effect, as defined for medial vowel-final letter units, was ob- 
tained for u,any pseudoword items, the pronunciation of others was not 
disambiguated by the consistent pronunciation of the vowel digraph-final con- 
sonant of that item. This result was observed on the items proup and cloup , 
in which the ou unit was frequently pronounced as /au/, despite the consisten- 
cy of pronunciation evidenced by the -oup unit as it appears in real words. 
Many of these exceptions were rationalized by considering possible interfer- 
ence from initial consonant-vowel digraph occurrences in familiar real words. 
These cases suggest that in future work it will be desirable to expand the 
concept of neighborhood consistency to examine influences from the initiaJ 
portion of the word as well as of the final. 

One possible explanation for ;he results is the operation of a 
left- to-right letter 3tring parser (Marcel, 1980). Marcel proposed that when 
a word or pseudoword is presented to a reader, the letter string is segmented 
in all possible ways. Each word segment, as it is parsed, automatically 
activates t. 4 e pronunciations of that unit as it occurs in different words. 
Thus, for the young reader the pronunciation activated for the word segments 
prou- and clou- may result from the words proud and cloud in their reading 
vocabularies. Word pronunciation may result from the parsing of successiv 
units of the _ tter string, during which the pronunciation of later appearing 
segments may override the pronu.»o iatior of prior segments (Baron & Strawson, 
1976; Marcel, 1980). For the young reader, then, it may be that the strength 
of the association between the unit ou and the /au/ pronunciation was too 
strong to be overridden by the pronunciation of -oup as it appears in the 
words soup and group . 

The proposal put forth by Marcei (1 980 ) a. so explains the pronunc iatior; 
of the ou unit in the item moung as /A/. According to that explanation, as a 
child attempts pronunciation of the pseudoword moung , ,he initial segment 
parsed is mou-, the ou unit likely to be pronotnced as / ac / on the basis of 
knowledge of words such as mouth and mountain . Wh*n the child parses the fi- 
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r,al segment of the letter string -oung, however, a different pronunciation for 
that unit is activated on the basis of the occurrence of that unit in the word 
youn g. As it happened, the pronunciation of the ou unit in the pseudcword 
moung was frequently /A/, attesting tc the strong effect the final word seg- 
ment maintains over word pronunciation. 

A left- to-right parser, with capacity to override and disambiguate 
pronunciations activated for earlier segments of a word, would require that 
the reader have a substantial reading vocabulary and awareness of the phonemic 
segmentation of the words n the lexicon. It has been well documented not only 
that phonemic awareness is a predictor of reading achievement (Plachman, 1983; 
Bryant & Bradley, 1980; Liberman, 1973; Lundberg, Olofsson, & Wail, 1980), but 
also that this awareness is enhanced by reading experience and instruction 
(Liberman, Liberman, Mattingly, *c Shankweiler, 1980; Morais, Cary, Alegria, & 
Bertel°on, 1979). We may speculate, therefore, that the limited reading 
vocabularies of the first graders, in combination with underdeveloped pl.oneme 
awareness and segmenting skills, effectively limit the amount of information 
that most first graders are able to utilize in reading new words. As a re- 
sult, they were more likely to ioentify high-frequency words correctly than 
low- frequency words, regardless of the number of alternate pronunciations for 
the vowel digraph. Insensitive to orthographic neighborhood consistency or 
inconsistency, the first-grade readers were unable to use the larger vowel di- 
graph-final consonant context to disambiguate vowel assignment to a vowe" di- 
graph. 

We must ask whether this result may be an artifact of instruction. All 
children participating in this study have received what is best identified as 
an eclectic approach to reading instruction. As reported, the third graders, 
and, even more so, the fifth graders, had developed a sensitivity to the 
orthographic neighborhood consistency, taking account of the wider vowel di- 
graph-final consonant context to disambiguate vowel assignment to vowel di- 
graphs. Apparently by the third grade, children who are progressing normally 
in reading have acquired a corpus of words in their reading vocabularies ade- 
quate to meet the demands of an operation that requires phoneme awareness, 
segmenting skill, and prior word knowledge to determine the pronunciation of 
an unfamiliar word. In contrast, the first graders, as they learn new words, 
are just beginning to identify phoneme correspondences of individual grapnemes 
and may depend heavily on these to identifv vowel digraphs. Thus, their re- 
sponses, though incorrect, include some substitutions that are possible in 
certain other contexts. 

This difference between the performances of the first and third graders 
raises critical questions for future investigation. We are interested to Know 
if, during that second year of formal reading instruction, children merely 
acquire a more extensive reading vocabulary in a rote manner, or if they begin 
then to analyze interword relations identifying consistencies between ortho- 
graphic structures larger than the individual letters and their pronunciation. 
Moreover, we should like to know whether different methods of instruction will 
make a difference in the development of these skills, and even whether there 
may be lasting effects of such instructional differences. In addition, our 
attention must turn to those older children who fail to acquire automatic word 
reading skills. Are these older, poorer readers functioning like the 
first-grade readers, or ^re they utilizing different information to determine 
the pronunciation of an unfamiliar word? It is clear that the answers to 
these questions will further our understanding of reading and how it develops. 
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